哲学、心理学和人工智能
Philosophy, Psychology, and Artificial Intelligence
由 John Haugeland、Carl F. Craver 和 Colin Klein 编辑
Edited by John Haugeland, Carl F. Craver, and Colin Klein
麻省理工学院出版社
马萨诸塞州剑桥
英国伦敦
The MIT Press
Cambridge, Massachusetts
London, England
© 2023 麻省理工学院
© 2023 Massachusetts Institute of Technology
保留所有权利。未经出版商书面许可,不得以任何电子或机械方式(包括影印、录制或信息存储和检索)复制本书的任何部分。
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
麻省理工学院出版社感谢对本书草稿提供评论的匿名同行评审员。学术专家的慷慨工作对于建立我们出版物的权威性和质量至关重要。我们衷心感谢这些未署名读者的贡献。
The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.
美国国会图书馆出版品目錄數據
Library of Congress Cataloging-in-Publication Data
姓名:Haugeland,John,1945-2010,编辑。| Craver,Carl F.,编辑。| Klein,Colin,1979- 编辑。
Names: Haugeland, John, 1945-2010, editor. | Craver, Carl F., editor. | Klein, Colin, 1979- editor.
书名:心智设计III:哲学、心理学与人工智能/John Haugeland、Carl F.
Craver、Colin Klein编辑。
Title: Mind design III: philosophy, psychology, and artificial intelligence / edited by John Haugeland, Carl F.
Craver, and Colin Klein.
其他名称:心灵设计 3
Other titles: Mind design 3
描述:马萨诸塞州剑桥:麻省理工学院出版社,[2023] | 包括书目参考。
Description: Cambridge, Massachusetts: The MIT Press, [2023] | Includes bibliographical references.
标识符:LCCN 2023007001(打印)| LCCN 2023007002(电子书)| ISBN 9780262546577(平装本)| ISBN 9780262376570 (epub) | ISBN 9780262376563 (pdf)
Identifiers: LCCN 2023007001 (print) | LCCN 2023007002 (ebook) | ISBN 9780262546577 (paperback) | ISBN 9780262376570 (epub) | ISBN 9780262376563 (pdf)
主题:LCSH:人工智能。| 认知心理学。
Subjects: LCSH: Artificial intelligence. | Cognitive psychology.
分类:LCC Q335.5.M492 2023(印刷版)| LCC Q335.5(电子书)| DDC 006.3–dc23/eng20230715
Classification: LCC Q335.5.M492 2023 (print) | LCC Q335.5 (ebook) | DDC 006.3–dc23/eng20230715
LC 记录可在https://
LC record available at https://
LC 电子书记录可在https://
LC ebook record available at https://
d_r0
d_r0
JH:献给芭芭拉和约翰三世
JH: for Barbara and John III
CFC:代表 Dar 和 Frank
CFC: for Dar and Frank
CK:献给 Bennett 和 Tony
CK: for Bennett and Tony
卡尔·F·克雷弗和科林·克莱因
Carl F. Craver and Colin Klein
2023
2023
《心灵设计 II》于 1997 年出版。此后的 25 年里,计算机科学家已经实现了许多成功人工智能 (AI) 的旧目标。计算机已经击败了国际象棋大师和 9 段围棋专业人士。曾经超出超级计算机能力的面部和语音识别技术现在可以在普通智能手机上找到。第一辆无人驾驶汽车出现在大城市的街道上。一个自动机器人目前正在打扫厨房。
Mind Design II was published in 1997. In the quarter century since, computer scientists have hit many of the old target objectives for successful artificial intelligence (AI). Computers have beaten chess grandmasters and 9-dan Go professionals. Face and voice recognition skills once beyond the capacity of supercomputers can now be found on commodity smartphones. The first driverless cars have appeared on the streets of major cities. An autonomous robot is currently vacuuming the kitchen.
这些成就是否表明人工智能正在或即将实现其目标?我们制造出会思考的机器了吗?其中一些成功,以及其他许多成功,都归功于计算机科学的巨大理论进步。它们依赖于 25 年前才刚刚起步的技术和算法。然而,持怀疑态度的人可能会反驳说,这些工具只是让完成愚蠢的把戏变得更容易。毕竟,在同一时期的成功也见证了处理器速度的几何级增长、小型化的巨大进步、机器学习平台的标准化以及处理以前难以想象的庞大训练数据集的能力。也许对人工智能成功的最佳解释更多地与这些工具的发展有关,而不是与如何将思维实现在机器上的理论洞察有关。工程学取得了令人难以置信的进步。但我们在理解和构建思维方面取得了进展吗?真正发生了哪些变化?
Do these achievements indicate that artificial intelligence is or will soon achieve its aim? Have we made machines that think? Some of these successes, and many more besides, are due to massive theoretical advances in computer science. They rely on techniques and algorithms that were barely in their infancy twenty-five years ago. Yet a more skeptical person might retort that these tools have simply made it easier to pull off mindless tricks. The same period of success, after all, has also witnessed geometric growth in processor speeds, huge strides in miniaturization, standardization of platforms for machine learning, and the ability to handle previously unimaginably large training data sets. Perhaps the best explanation for the success of AI has more to do with these instrumental developments than with theoretical insight into how minds can be implemented in machines. Engineering has made incredible strides. But have we made progress in understanding and building minds? What really has changed?
本书邀请读者对这个问题进行反思。与 John Haugeland 的《心智设计》之前的版本一样,我们在此收集了经典和当代文章,探讨计算的本质、思维的本质以及计算机是否(如果可以,如何)能够思考的问题。与《心智设计》第二版一样,我们的新书延续了一些内容并引入了新内容。我们希望制作一本字体易读、可以放在背包里的书。我们还旨在提供一份更具包容性的阅读清单。这导致我们做出了一些艰难的决定,决定哪些内容必须删减或删节。
The present volume is an invitation to reflect upon this question. As in previous editions of John Haugeland’s Mind Design, we have collected here a mix of classic and contemporary articles that address the nature of computation, the nature of thought, and the question of whether (and, if so, how) computers can be made to think. As with the second edition of Mind Design, ours carries over some pieces and introduces new ones. We wanted to make a book with a readable font that could still be carried in a backpack. We also aimed for a more inclusive reading list. This led to some difficult decisions about what had to be cut or abridged.
例如,我们删除了蒂莫西·范·盖尔德 (Timothy van Gelder) 对认知动态方法的深刻辩护,以及保罗·丘奇兰 (Paul Churchland) 的论证,即联结主义架构应该取代心智和认识论的命题理论。我们还删除了关于联结主义架构是否能够进行命题思维所需的符号处理的激烈争论。我们删除了休伯特·德雷福斯 (Hubert Dreyfus) 对计算机无法做什么的经典讨论,该讨论本身是基于当时人工智能研究的痛点推断的。
For example, we have removed Timothy van Gelder’s insightful defense of a dynamical approach to cognition and Paul Churchland’s argument that connectionist architectures ought to supplant propositional theories of mind and epistemology. We have also removed the vigorous debates about whether connectionist architectures can do the kind of symbolic processing that propositional thought required. We have cut Hubert Dreyfus’s classic discussion of what computers can’t do, which itself was based on extrapolations from then-current pain points in AI research.
我们之所以删减这些内容,并不是因为我们认为这些问题已经得到解决。相反,我们这样做是因为动态和表征之间、联结主义和符号主义架构之间的强烈对比随着时间的推移变得模糊。在我们看来,关键的争论不再沿着这些思路进行。此外,好的哲学在思考什么是可能的时,应该始终关注现实。在我们看来,许多旧的论点都依赖于特定的架构和编程风格,而这些架构和风格后来已被更强大的技术所取代。PDP 网络在某些类型的句法转换方面遇到了困难,这仍然是历史遗留问题,但快速浏览一下谷歌翻译就会发现这些特定问题已经得到解决。正如 Melanie Mitchell 在本书中指出的那样,在此期间出现了新的问题;它们值得我们关注。
We have not cut this material because we take the issues to be settled. Rather, we did so because the hard contrasts between dynamic and representational, or between connectionist and symbolic architectures, have blurred with time. The key debates no longer seem to us to travel along those lines. Further, good philosophy should always keep an eye on the actual as it thinks about what is possible. Many of the old arguments seemed to us to turn on particular architectures and programming styles that have since been supplanted by more powerful techniques. It remains of historical interest that PDP networks struggled with certain kinds of syntactic transformations, but a quick peek on Google Translate will show that these particular issues have been solved. As Melanie Mitchell notes in her contribution to this volume, new problems have cropped up in the meantime; they deserve our attention.
作为对这些删减的回报,我们希望新增内容能够代表更广泛的当代、哲学相关的人工智能思想。其中一些新增内容(如大卫·马尔的《愿景》摘录)来自较早的作品,但它们已成为根深蒂固的经典,被各个学科引用。玛格丽特·博登对约翰·塞尔的回应也是如此,它经常被提及,但已经很难找到。其他一些是对较旧问题的新评论:例如斯图尔特·罗素对理性本质的评论,或芭芭拉·韦伯对神经启发机器人的最新评论。
In return for these cuts, we hope that the additions will represent a broader range of contemporary, philosophically relevant thinking about AI. Some of the additions—like the excerpt from David Marr’s Vision—are from older works, but they have become entrenched classics, cited across a range of disciplines. Similarly so with Margaret Boden’s response to John Searle, which is often mentioned but has become difficult to find. Others are newer commentaries on older issues: Stuart Russell on the nature of rationality, for example, or Barbara Webb’s update on neurally inspired robotics.
然而,大部分新增内容都反映了人工智能方法的爆炸式增长和多样化。我们增加了新的章节,讨论深度神经网络、强化学习和因果学习的进展。事实上,如果《心智设计 II》的首要主题之一是老式符号人工智能与简单神经网络之间的斗争,那么《心智设计 III》的主要教训可能是该领域在多大程度上已经形成了一种务实的多元化——愿意根据领域和目标混合搭配技术。这种多元化是否会继续取得进展还有待观察。
The bulk of the additions, however, come from adding material that reflects the explosive growth and diversification of AI approaches. We have added new chapters that discuss advances in deep neural networks, reinforcement learning, and causal learning. Indeed, if one of the overarching themes of Mind Design II was the fight between old-fashioned symbolic AI and simple neural networks, then the main lesson of Mind Design III might be how much the field has settled into a kind of pragmatic pluralism—a willingness to mix and match techniques according to domains and aims. Whether this pluralism will continue to make progress remains to be seen.
随着人工智能的发展,20 世纪 90 年代和 21 世纪认知神经科学的爆炸式增长也改变了我们对认知机制的看法。《心智设计 II》是在非还原物理主义的鼎盛时期编写的。当时大多数心智哲学家都认为,了解大脑并不能深入了解心智;事实上,计算主义的吸引力之一在于它允许你从实现细节中抽象出来。非还原物理主义,至少是那种特别严谨的物理主义,不再占主导地位。我们坚定地站在综合方法的一边。在哲学之外,认知和实现之间的严格界限已经消失:计算神经科学现在与神经启发的人工智能共存。因此,我们还增加了几章来反映神经科学和人工智能如何相互补充。
Alongside the advancement of AI, the explosive growth of the cognitive neurosciences in the 1990s and 2000s also changed the way that we think about cognitive machinery. Mind Design II was assembled in the heyday of nonreductive physicalism. It was a time when most philosophers of mind agreed that learning about the brain would give little insight into the mind; indeed, part of the attraction of computationalism was that it allowed you to abstract away from implementational details. Nonreductive physicalism, at least of that particularly austere sort, is no longer the dominant position. We are firmly on the side of an integrative approach. Outside of philosophy, the hard boundaries between cognition and implementation have fallen: computational neuroscience now coexists alongside neurally inspired AI. As such, we’ve also added several chapters that reflect how neuroscience and AI have come to complement one another.
第三版没有像以前的版本那样按时间顺序排列,而是分为六个主题部分。每个部分都以独立的介绍开始,旨在将这些文章置于哲学和科学背景中。我们为教师编写了这些介绍,他们可能会利用这些讨论来充实各个感兴趣领域的教学大纲,也为普通读者编写了这些介绍,他们可能想更深入地研究这些主题中的一个或多个。正文中的选文经过精心挑选,既具有历史意义,又具有当代意义,重点是构建教学大纲。就其本身而言,这本读物应该为中高级本科课程提供坚实的基础,该课程涉及计算方法在研究心智方面的挑战和局限性。每个部分的介绍都有一个带注释的参考书目,我们希望读者可以将其作为起点,以便进一步了解所包含的文章。这些参考书目仍然只包括可用内容的一小部分——我们打算将它们作为起点,而不是完整的列表。
Rather than the broadly chronological ordering of previous editions, this third edition is divided into six thematic parts. Each part begins with a stand-alone introduction designed to place these essays in philosophical and scientific context. We wrote these both for instructors, who might use the discussions to flesh out syllabi around the various areas of interest, and for general readers, who might want to dive deeper into one or more of these topics. The selections in the main text were chosen to give a mix of historical and contemporary relevance, with a focus on building syllabi. On its own, this reader should provide the basis for a solid mid- or upper-level undergraduate course on the challenges and limits of computational approaches to the mind. The introduction to each part has an annotated bibliography, which we hope readers can use as a starting point when going further than the included essays. Those bibliographies still include only a fragment of what is available—we intend them as starting points, not as comprehensive lists.
选择六个主题领域还需要放弃一些有趣的研究领域,包括我们最初希望涵盖的一些领域。我们没有讨论超人智能的可能性或“奇点”的到来。我们也没有讨论人工智能伦理。这个领域发展如此迅速,值得拥有自己的合集,而且变化如此之快,任何合集在印刷之前都会过时。我们一直关注智能,所以我们不考虑机器是否也可以有意识。在我们看来,意识这个难题非常困难——构建良好的心智计算模型既不会让这个问题变得更容易,也不会让这个问题变得更难。这也是一个其他非常好的书籍所讨论的问题。最后,人工智能是一个发展迅速的领域。在准备这本书的时候,Transformer 架构(比如 ChatGPT 所支持的架构)已经普及。我们只能希望,书中的章节能够在当代相关性和哲学永恒性之间取得适当的平衡,而不是因为追逐新的发展而进一步拖延它。
The choice of six topical areas also required leaving out whole domains of interesting work, including some that we had initially hoped to cover. We have not discussed the likelihood of superhuman intelligence or the arrival of the “singularity.” Nor do we discuss AI ethics. That field is growing so rapidly that it merits a collection of its own, and changing so rapidly that any collection would be outdated before it arrived at the printer. We have kept the focus on intelligence, so we do not consider whether machines can also be conscious. The hard problem of consciousness seems to us a very difficult one—one made neither more nor less tractable by the construction of good computational models of mind. It is also a problem about which there are other very good volumes. Finally, AI is a field that moves quickly. Transformer architectures (such as those that power ChatGPT) became widespread while the volume was in preparation. Rather than delay it further by chasing new developments, we can only hope that the chapters included have hit the right balance between contemporary relevance and philosophical timelessness.
汇编读本是一项出乎意料的复杂任务。我们感谢在此过程中得到的所有帮助。从一开始,我们就感谢 Joan Wellman 给予我们这个荣誉,让我们能够在新世纪继续出版 John Haugeland 编辑的选集。麻省理工学院出版社的 Phil Laughlin 对我们的宣传很感兴趣,并帮助我们度过了新冠疫情。Peter Clutton 负责版权许可方面的侦查工作。葛锋帮助转换了之前版本的章节。Pamela Speh 协助复制图表。Andre Santana 协助格式化文本和图表,以及准备参考书目。Jeremy Strasser 协助校对和排版。Nicholas Carroll 协助最终校对。非常感谢与 Gualtiero Piccinini 共同教授的“计算与认知”校内课程(WUSTL 和 UMSL)的学生;他们帮助确定了哪些有效,哪些无效。
Assembling a reader has been an unexpectedly complicated task. We appreciate the help that we received along the way. From the start, we thank Joan Wellman for permitting us the honor of continuing John Haugeland’s edited collection into a new century. Phil Laughlin at the MIT Press was enthusiastic about our pitch and helped to shepherd the volume through the COVID-19 pandemic. Peter Clutton did the detective work on copyright permissions. Ge Feng helped with converting the chapters from previous editions. Pamela Speh assisted with duplicating figures. Andre Santana assisted with the formatting of texts and figures, as well as with preparing the bibliography. Jeremy Strasser assisted with proofreading and typesetting. Nicholas Carroll assisted with final copyedits. Many thanks to the students in an intermural course (WUSTL and UMSL), co-taught with Gualtiero Piccinini, on “Computation and Cognition”; they helped to determine what was working and what was not.
最后,我们非常感谢已故的约翰·豪格兰。《心智设计》的早期版本不仅收集了现有的作品,还定义了一个完整的子领域,并将人工智能哲学确立为一个值得尊敬的话题。我们两人的思想轨迹都受到本书早期版本的影响。更新这本书是一项艰巨的工作,我们最重要的是希望这个新版本能将豪格兰的成就以及他独特的哲学风格和声音介绍给新一代。
Finally, we owe a tremendous debt to the late John Haugeland. Earlier editions of Mind Design didn’t just collect existing work—they defined a whole subfield, and established philosophy of AI as a respectable topic in its own right. Both of us had our intellectual trajectories shaped by earlier editions of this volume. Updating it has been a labor of love, and we hope above all that this new edition will introduce Haugeland’s achievement, as well as his inimitable philosophical style and voice, to a new generation.
推动人工智能 (AI) 发展的核心信念是计算机拥有制造思维所需的所有原始要素。这就是为什么人们希望通过尝试从计算系统构建智能系统来研究思维是什么以及生物系统如何使思维成为可能的原因之一。这种激励信念首先取决于对计算机是什么的理解。
The core conviction motivating artificial intelligence (AI) is that computers have all the raw ingredients they need to make minds. This is one reason why one might hope to study what minds are—and how biological systems make minds possible—by attempting to build intelligent systems out of computing systems. This motivating conviction depends, in the first instance, on an understanding of what computers are.
Haugeland 在第 2 章中对《心智设计 II》的原始介绍阐述了计算机是什么(自动形式系统)的理论,并阐明了人工智能面临的核心挑战。他的论文对本书的主题以及应该掌握的关键概念进行了出色的介绍:什么是计算机?什么是形式系统?数字和模拟有什么区别?什么是意向性?这篇论文还以原始形式包含了 Haugeland 的简洁挑战:“计算机的问题在于它们根本不在乎。”我们可以问,建立一个真正在乎的系统需要什么?
Haugeland’s original introduction to Mind Design II, given in chapter 2, lays out a theory of what computers are (automatic formal systems) and articulates the central challenges for AI. His essay provides an excellent introduction to the topics of this volume and to the key concepts one should master: What is a computer? What is a formal system? What is the difference between digital and analog? What is intentionality? This essay also contains, in primitive form, Haugeland’s pithy challenge: “The problem with computers is that they don’t give a damn.” What, we can ask, would it take to build a system that actually gives a damn?
在第 3 章中,计算机科学和人工智能的先驱 Newell 和 Simon 提出了人工智能核心的一个关键经验假设,并且因此更广泛地影响了心智计算理论:物理符号系统假设。根据他们的说法,物理符号系统是产生智能行为的必要条件。只要有资源和适当的编程,它们也足以产生智能行为。这两个主张是认知科学的基础。他们提出了以下问题:什么样的物理系统可以在不操纵符号的情况下做有趣的事情?如果认知从根本上不是符号操纵,那它又会是什么呢?
In chapter 3, Newell and Simon, pioneers of computer science and AI, advance a key empirical hypothesis at the heart of AI and, as a consequence, of computational theories of mind more generally: the Physical Symbol System Hypothesis. According to them, physical symbol systems are necessary to produce intelligent action. Given resources and appropriate programming, they are also sufficient to produce intelligent action. These two claims are foundational for much of cognitive science. They raise the questions: What kinds of physical system do interesting things without manipulating symbols? What might cognition be if it is not, at base, symbol manipulation?
大卫·马尔 (David Marr) 因其对计算解释、认知科学和神经科学之间联系的经典阐述而闻名。第 4 章中的文章摘自他的著作《视觉》( Vision) (1982),介绍了他现在经典的三部分分析层次划分:计算层次(描述系统做什么和为什么做)、算法层次(描述表示系统和处理规则)和实现层次(描述实例化算法的硬件)。他解释了这种分工如何应用于他自己在视觉核心任务上的工作:从视网膜上的阴影模式中提取物体的形状。马尔的著名论点是,对系统的计算级理解与实例化该计算的算法和实现该计算的(生物或人工)机制无关。马尔的文章之所以重要,不仅是因为它已成为计算认知科学家的常用话题,还因为它被用来构建关于这些不同层次的解释对认知科学的相对重要性的辩论。
David Marr is well known for his classic articulation of the links between computational explanation, cognitive science, and neuroscience. The essay in chapter 4, extracted from his book, Vision (1982), introduces his now-classic tripartite division of levels of analysis: the computational level (describing what the system does and why), the algorithmic level (describing the representational system and rules for processing) and the implementational level (describing the hardware that instantiates the algorithm). He explains how this division of labor applies to his own work on a core task of vision: that of extracting shapes of objects from patterns of shading on the retina. Marr famously argues that computational-level understanding of a system is autonomous from the algorithm in which that computation is instantiated and the (biological or artificial) mechanism by which it is implemented. Marr’s essay is important not only because it has become stock in trade among computational cognitive scientists, but because it has been used to frame debates about the relative import of these different levels of explanation for cognitive science.
在本部分的最后一篇文章中,Corey Maley 讨论了重要且经常被忽视的模拟计算主题。正如他所指出的,计算的讨论通常只局限于数字计算机,而忽略了很大一部分计算可能性。数字和模拟之间的对比,以及模拟计算的独特特征,使得它们对于理解本书后面部分探讨的联结主义、动态和具体化论点尤为重要。
In the final essay of this part, Corey Maley takes up the important and often-neglected topic of analog computation. As he notes, discussions of computation often focus narrowly on digital computers, neglecting a large portion of the space of computational possibilities. The contrast between digital and analog, as well as the distinctive features of analog computation, make them especially important for understanding connectionist, dynamic, and embodied theses that are explored in later parts of the book.
本部分的文章可以与第二部分的文章结合起来,尤其是图灵的“计算机器和智能”(第 6 章),该章阐述了他自己对计算的看法,同时非常清楚地介绍了图灵机,这是他的计算机典范。结合第五部分,读者还可以探索如何将计算的核心思想扩展到生物/神经计算系统(第 20 章的 Churchland 和 Sejnowski)以及至少在表面上与图灵机非常不同的计算机类型(例如,第 19 章的 Rummelhart 和第 17 章的 Clark),这对读者也有好处。读者还可以考虑第六部分中计算的概念如何与具身和嵌入的认知方法(例如,第 22 章的 Haugeland 和第 23 章的 Brooks)相结合。
The articles in this part can usefully be combined with those in part II, especially Turing’s “Computing Machinery and Intelligence” (chapter 6), which articulates his own account of computing while providing a very clear introduction to the Turing Machine, his exemplar of a computer. Readers might also benefit from exploring, in combination with part V, how this core idea of computation can or cannot be stretched to apply to biological/neural computing systems (Churchland and Sejnowski, in chapter 20) and to the kinds of computers that are, at least on their surface, very unlike Turing Machines (e.g., Rummelhart, in chapter 19, and Clark, in chapter 17). Readers might also consider how the notion of computation can or cannot be combined with embodied and embedded approaches to cognition (e.g., Haugeland, in chapter 22, and Brooks, in chapter 23) in part VI.
我们选择本部分的文章主要是因为它们具有基础意义。然而,计算及其对比的主题在过去二十年中得到了广泛的发展,并经历了相当大的细微改进。至少许多参与讨论的人的基本假设是,真正计算的系统与不计算的系统之间存在客观区别,因此计算机是一种自然类型。为计算机和计算提供令人满意的定义的努力面临着双重障碍:琐碎性(即,一切都算作计算机)和僵化性(即,根据定义排除被广泛接受为计算的系统,如大脑)。
We chose the essays in this part largely for their foundational import. Yet the topic of computation and its contrasts has developed extensively and seen considerable nuanced refinement over the past two decades. A fundamental assumption of at least many contributors to this discussion is that there is an objective distinction between systems that really compute and systems that do not, and hence that computers form a natural kind. The effort to provide a satisfactory definition of computers and computation faces the twin obstacles of triviality (i.e., that everything counts as a computer) and rigidity (i.e., the exclusion by definition of systems like brains that are widely accepted as computing).
琐碎性挑战。 琐碎性挑战(有时被称为“泛计算主义的幽灵”)最初是作为计算简单映射理论的一个问题而出现的。这是由 Putnam (1991) 和 Searle (1992) 提出的,作为对心智计算理论的更普遍批评的一部分。最明显的琐碎性挑战声称,只要你以正确的方式看待它,任何物理系统都会实现每个可计算的功能。算作计算的东西越多,将任何特定的东西称为“计算机”就越没意思。如果一桶水或一堵实心墙可以计算,那么大脑也可以计算这一事实就不能告诉我们太多了。
The Triviality Challenge. The triviality challenge (sometimes referred to as the”spectre of pancomputationalism”) initially arose as a problem for simple mapping theories of computation. This was raised by Putnam (1991) and Searle (1992) as part of more general criticisms of the computational theory of mind. The starkest forms of the triviality challenge claims that any physical system implements every computable function, so long as you look at it in the right way. The more things that count as computing, the less interesting it is to call anything in particular a”computer.” If a pail of water or a solid wall computes, then the fact that the brain also computes doesn’t tell us much.
什么是计算? Piccinini 和 Maley (2021) 简明概述了不同的计算观点、选择这些观点的必要条件以及它们的相对优缺点。他们将这些观点分为几个不同的阵营:
What Is Computation? Piccinini and Maley (2021) provide a succinct overview of different views of computation, the desiderata for selecting among them, and their relative strengths and weaknesses. They group these into several different camps:
Marr 的分析层次。 那些寻求对 Marr 层次进行反思处理及其对计算机科学和计算生物学、认知科学和神经科学可能会考虑 Bechtel 和 Shagrir (2015),他们对 Marr 的分析层次提供了自己的解释,并认为每个层次都提供了解释信息处理机制所需的非冗余视角。Churchland 和 Sejnowski(第 20 章)对 Marr 层次的性质和意义提出了自己的看法,这些观点有助于我们理解思维与计算机以及计算机与其他物理系统之间的关系。
Marr’s Levels of Analysis. Those looking for a reflective treatment of Marr’s levels and their continued significance for computer science and computational biology, cognitive science, and neuroscience might consider Bechtel and Shagrir (2015), who provide their own interpretation of Marr’s levels of analysis and argue that each provides a nonredundant perspective required to explain information processing mechanisms. Churchland and Sejnowski (chapter 20) offer their own take on the nature and significance of Marr’s levels for our understanding of the relationship between minds and computers and between computers and other physical systems.
模拟计算。 关于计算本质的许多基本争议都与数字计算机和模拟计算机之间的差异有关。有兴趣了解更多有关这种区别的人可以参考以下内容:
Analog Computation. Many foundational disputes about the nature of computation turn on a difference between digital and analog computers. Those interested in reading more about this distinction might consult the following:
约翰·豪格兰
John Haugeland
1996
1996
心智设计是从心智(思维、智力)的设计(心智如何构建、如何运作)的角度来理解心智(思维、智力)的努力。因此,它相当于一种认知心理学。但与传统的经验心理学相比,它更注重结构和机制,而非相关性或规律,更注重“如何”而非“是什么”。心智设计中的“实验”更多是努力构建某种东西并使其运作,而不是观察或分析已经存在的事物。因此,人工智能 (AI) 领域,即构建智能制品和具有自己心智的系统的努力,是心智设计的核心。当然,自然智能,尤其是人类智能,仍然是最终的研究对象,是最终需要理解的现象。与众不同的不是目标,而是实现目标的手段。心智设计是逆向工程的心理学。
MIND DESIGN is the endeavor to understand mind (thinking, intellect) in terms of its design (how it is built, how it works). It amounts, therefore, to a kind of cognitive psychology. But it is oriented more toward structure and mechanism than toward correlation or law, more toward the “how” than the “what”, than is traditional empirical psychology. An “experiment” in mind design is more often an effort to build something and make it work, than to observe or analyze what already exists. Thus, the field of artificial intelligence (AI), the attempt to construct intelligent artifacts, systems with minds of their own, lies at the heart of mind design. Of course, natural intelligence, especially human intelligence, remains the final object of investigation, the phenomenon eventually to be understood. What is distinctive is not the goal but rather the means to it. Mind design is psychology by reverse engineering.
尽管智能制品的概念与希腊神话一样古老,并且是奇幻小说中常见的主题,但它被认真地视为科学却还不到两代人的时间。原因并不难发现:在几个概念和技术突破出现之前,没有人知道如何继续。即使先驱者们大胆地探索未知领域,他们真正想要做什么,他们自己和他人仍然不清楚;有些人现在仍然不清楚。因此,思维设计一直是哲学界感兴趣的领域,在这个领域中,概念基础——要问的问题,以及什么算作答案——一直异常不稳定且充满争议。
Though the idea of intelligent artifacts is as old as Greek mythology, and a familiar staple of fantasy fiction, it has been taken seriously as science for scarcely two generations. And the reason is not far to seek: pending several conceptual and technical breakthroughs, no one had a clue how to proceed. Even as the pioneers were striking boldly into the unknown, much of what they were really up to remained unclear, both to themselves and to others; and some still does. Accordingly, mind design has always been an area of philosophical interest, an area in which the conceptual foundations—the very questions to ask, and what would count as an answer—have remained unusually fluid and controversial.
这里收集的文章涵盖了该领域自诞生以来的历史(尽管重点是最近的发展)。作者大致分为哲学家和科学家两类。然而,所有文章都是“哲学的”,因为它们涉及基本问题和基本概念;同时,几乎所有文章也是“科学的”,因为它们在技术上很复杂,并且关注具体实证研究的成就和挑战。书中代表了几种主要的趋势和思想流派,它们经常相互争论。因此,在它们的并列中,不仅可以看到地形、主要山峰和山谷,还可以看到当前的运动、仍然活跃的断层线。
The essays collected here span the history of the field since its inception (though with emphasis on more recent developments). The authors are about evenly divided between philosophers and scientists. Yet, all of the essays are “philosophical”, in that they address fundamental issues and basic concepts; at the same time, nearly all are also “scientific” in that they are technically sophisticated and concerned with the achievements and challenges of concrete empirical research. Several major trends and schools of thought are represented, often explicitly disputing with one another. In their juxtaposition, therefore, not only the lay of the land, its principal peaks and valleys, but also its current movement, its still active fault lines, can come into view.
作为介绍,我将在下文中尝试阐明使这一切成为可能的一些基本思想。
By way of introduction, I shall try in what follows to articulate a handful of the fundamental ideas that have made all this possible.
本文的作者都不相信智慧依赖于任何非物质或超自然的东西,例如生命精神或不朽灵魂。因此,他们都是唯物主义者,至少在最低限度上认为,经过适当选择和安排的物质足以产生智慧。问题是:如何产生智慧?
None of the present authors believes that intelligence depends on anything immaterial or supernatural, such as a vital spirit or an immortal soul. Thus, they are all materialists in at least the minimal sense of supposing that matter, suitably selected and arranged, suffices for intelligence. The question is: How?
认为心灵“不过是”运动中的物质,这似乎令人难以置信。我们能想象所有这些小原子在热混沌中飞驰而过时,都在思考着深刻的思想吗?或者,如果不是一个接一个,那么也许是以万亿计的集体形式?这个难题的答案是认识到事物可以从不同的角度看待(或用不同的术语描述)——而且,当我们以不同的方式看待时,我们能够看到的东西也会不同。例如,在显微镜下看到的粗糙的磨损线编织物在商店橱窗中看到的却是一条闪亮的丝巾。在古董修复者眼中,精美的古老钟表在被视为废金属的价值几美分的黄铜中却如此。同样,这个想法是,从一个角度看虚空中的原子,在另一个角度看可能是一个智能系统。
It can seem incredible to suggest that mind is “nothing but” matter in motion. Are we to imagine all those little atoms thinking deep thoughts as they careen past one another in the thermal chaos? Or, if not one by one, then maybe collectively, by the zillions? The answer to this puzzle is to realize that things can be viewed from different perspectives (or described in different terms)—and, when we look differently, what we are able to see is also different. For instance, what is a coarse weave of frayed strands when viewed under a microscope is a shiny silk scarf seen in a store window. What is a marvellous old clockwork in the eyes of an antique restorer is a few cents’ worth of brass, seen as scrap metal. Likewise, so the idea goes, what is mere atoms in the void from one point of view can be an intelligent system from another.
当然,你不能随心所欲地看待任何事物——至少,不能正确地看待它。废品经销商不能把木凳看作价值几美分的黄铜,因为它不是黄铜;古物学家不能把铜猴子看作钟表,因为它不像钟表那样运转。然而,尴尬的是,这两点放在一起似乎造成了一个两难境地。根据第一点,某物是什么——粗糙还是精细,钟表还是废金属——取决于你如何看待它。但是,根据第二点,你如何正确地看待(或描述)某物取决于它是什么。人们想问,哪个先出现,是看到还是存在?
Of course, you can’t look at anything in just any way you please—at least, not and be right about it. A scrap dealer couldn’t see a wooden stool as a few cents’ worth of brass, since it isn’t brass; the antiquarian couldn’t see a brass monkey as a clockwork, since it doesn’t work like a clock. Awkwardly, however, these two points taken together seem to create a dilemma. According to the first, what something is—coarse or fine, clockwork or scrap metal—depends on how you look at it. But, according to the second, how you can rightly look at something (or describe it) depends on what it is. Which comes first, one wants to ask, seeing or being?
显然,这个问题有些问题。某物是什么和如何正确地看待它本质上并没有区别;两者都不重要,因为它们是相同的。然而,强调视角的好处是,它突显了以下问题:什么限制了如何正确地看待或描述某物(从而决定了它是什么)?这很重要,因为对于不同类型的视角或描述,答案会有所不同——正如我们的例子已经说明的那样。有时,某物是什么由其形状或形式决定(在相关的细节层面上);有时由其组成决定;有时由其工作原理甚至只是其功能决定。其中哪些(如果有的话)可以决定某物是否(正确地看待或描述为)智能?
Clearly, there’s something wrong with that question. What something is and how it can rightly be regarded are not essentially distinct; neither comes before the other, because they are the same. The advantage of emphasizing perspective, nevertheless, is that it highlights the following question: What constrains how something can rightly be regarded or described (and thus determines what it is)? This is important, because the answer will be different for different kinds of perspective or description—as our examples already illustrate. Sometimes, what something is is determined by its shape or form (at the relevant level of detail); sometimes it is determined by what it’s made of; and sometimes by how it works or even just what it does. Which—if any—of these could determine whether something is (rightly regarded or described as) intelligent?
1950 年,计算机科学家先驱 AM Turing 提出,智能是一种行为或行为能力:系统是否具有思维或智能程度取决于它能做什么和不能做什么。大多数唯物主义哲学家和认知科学家现在都接受这一普遍观点(尽管 John Searle 是个例外)。图灵还提出了一个实用标准或测试,以判断系统能做什么,这足以表明它是智能的。(他并没有声称如果系统不能通过他的测试,它就不是智能的;只是说如果系统能通过他的测试,它就是智能的。)这个测试现在称为图灵测试,它在很多方面都存在争议,但其精神仍然受到广泛尊重。
In 1950, the pioneering computer scientist A. M. Turing suggested that intelligence is a matter of behavior or behavioral capacity: whether a system has a mind, or how intelligent it is, is determined by what it can and cannot do. Most materialist philosophers and cognitive scientists now accept this general idea (though John Searle is an exception). Turing also proposed a pragmatic criterion or test of what a system can do that would be sufficient to show that it is intelligent. (He did not claim that a system would not be intelligent if it could not pass his test; only that it would be if it could.) This test, now called the Turing test, is controversial in various ways, but remains widely respected in spirit.
图灵以模拟或模仿的方式进行测试:如果一个非人类系统在某些方面表现得与普通人非常相似,以至于其他普通人无法(仅从这些行为)判断它不是普通人,那么它将被视为智能系统。但模仿的想法本身并不是图灵提议的重要部分。重要的是图灵为测试选择的特定行为类型:他指定了言语行为。他说,如果一个系统可以像普通人一样进行普通对话(通过电子方式,避免受到外表、语调等的影响),那么它肯定是智能的。
Turing cast his test in terms of simulation or imitation: a nonhuman system will be deemed intelligent if it acts so like an ordinary person in certain respects that other ordinary people can’t tell (from these actions alone) that it isn’t one. But the imitation idea itself isn’t the important part of Turing’s proposal. What’s important is rather the specific sort of behavior that Turing chose for his test: he specified verbal behavior. A system is surely intelligent, he said, if it can carry on an ordinary conversation like an ordinary person (via electronic means, to avoid any influence due to appearance, tone of voice, and so on).
这是一个大胆而彻底的简化。智力的表现方式有很多种。为什么要特别强调说话呢?请记住:图灵并没有说以这种方式说话是智力表现的必要条件,只是说这样就足够了。所以不必担心测试太难;唯一的问题是它是否太宽松。例如,我们知道有些系统可以调节温度、产生复杂的节奏,甚至可以驾驶飞机,但从任何意义上来说,它们都不是智能的。为什么进行普通对话的能力不能是那样呢?
This is a daring and radical simplification. There are many ways in which intelligence is manifested. Why single out talking for special emphasis? Remember: Turing didn’t suggest that talking in this way is required to demonstrate intelligence, only that it’s sufficient. So there’s no worry about the test being too hard; the only question is whether it might be too lenient. We know, for instance, that there are systems that can regulate temperatures, generate intricate rhythms, or even fly airplanes without being, in any serious sense, intelligent. Why couldn’t the ability to carry on ordinary conversations be like that?
图灵的回答优雅而深刻:说话在智能能力中是独一无二的,因为它在自身之内聚集了所有其他能力。人们无法“关于”说话而产生节奏或驾驶飞机,但人们当然可以谈论节奏和飞行——更不用说诗歌、体育、科学、烹饪、爱情、政治等等——而且,如果人们不知道自己在说什么,很快就会变得非常明显。说话不仅仅是众多智能能力中的一种,而且本质上也是智能地表达许多其他(可能是所有)智能能力的能力。并且,事实上,至少在某种程度上,如果人们不具备这些能力,就无法智能地谈论它们。这就是图灵测试如此引人注目和强大的原因。
Turing’s answer is elegant and deep: talking is unique among intelligent abilities because it gathers within itself, at one remove, all others. One cannot generate rhythms or fly airplanes “about” talking, but one certainly can talk about rhythms and flying—not to mention poetry, sports, science, cooking, love, politics, and so on—and, if one doesn’t know what one is talking about, it will soon become painfully obvious. Talking is not merely one intelligent ability among others, but also, and essentially, the ability to express intelligently a great many (maybe all) other intelligent abilities. And, without having those abilities in fact, at least to some degree, one cannot talk intelligently about them. That’s why Turing’s test is so compelling and powerful.
另一方面,即使不是太容易,但从某种意义上说,该测试确实掩盖了某些实际困难。通过集中于对话能力(可以完全通过书写(例如,通过计算机终端)来展示),图灵测试完全忽略了现实世界的感知和行动问题。然而,事实证明,在任何合理的复杂程度下,人工实现这些问题都极其困难。更糟糕的是,忽略实时环境交互会扭曲系统设计师的关于智能系统如何更普遍地与世界相关的假设。例如,如果一个系统必须处理或应对周围的事物,但不会持续地从外部跟踪它们,那么它将需要以某种方式“跟踪”或在内部表示它们。因此,忽视感知和行动可能会导致过分强调表示和内部建模。
On the other hand, even if not too easy, there is nevertheless a sense in which the test does obscure certain real difficulties. By concentrating on conversational ability, which can be exhibited entirely in writing (say, via computer terminals), the Turing test completely ignores any issues of real-world perception and action. Yet these turn out to be extraordinarily difficult to achieve artificially at any plausible level of sophistication. And, what may be worse, ignoring real-time environmental interaction distorts a system designer’s assumptions about how intelligent systems are related to the world more generally. For instance, if a system has to deal or cope with things around it, but is not continually tracking them externally, then it will need somehow to “keep track of” or represent them internally. Thus, neglect of perception and action can lead to an overemphasis on representation and internal modeling.
布伦塔诺 (1874/1973) 曾说,“意向性是精神的标志”。他这样说的意思是,一切精神事物都具有意向性,其他事物则不具有意向性(除非以衍生或二手的方式),最后,这一事实是精神的定义。这里使用的“意向性”是中世纪的意思,可以追溯到拉丁语的原始含义“伸向”某物;它不仅限于计划和目的之类的东西,还适用于各种精神行为。更具体地说,意向性是指一件事物“属于”或“关于”另一件事的特征,例如通过表示它、描述它、提及它、瞄准它等等。因此,狭义的现代意义上的意图(规划)在布伦塔诺更广泛和更古老的意义上也是意向性的,但许多其他事物也具有意向性,例如相信、想要、记忆、想象、恐惧等等。
“Intentionality”, said Brentano (1874/1973), “is the mark of the mental.” By this he meant that everything mental has intentionality, and nothing else does (except in a derivative or second-hand way), and, finally, that this fact is the definition of the mental ‘Intentional’ is used here in a medieval sense that harks back to the original Latin meaning of “stretching toward” something; it is not limited to things like plans and purposes, but applies to all kinds of mental acts. More specifically, intentionality is the character of one thing being “of” or “about” something else, for instance by representing it, describing it, referring to it, aiming at it, and so on. Thus, intending in the narrower modern sense (planning) is also intentional in Brentano’s broader and older sense, but much else is as well, such as believing, wanting, remembering, imagining, fearing, and the like.
意向性是奇特而令人困惑的。它表面上看起来是两件事之间的关系。我认为开罗很热是意向性的,因为它与开罗有关(和/或它很热)。意向行为或状态(例如开罗或它很热)所指的事物被称为其意向对象。(意向状态“延伸”的正是这个意向对象。)同样,我对某件衬衫的渴望、我对某个日期的派对的想象、我对狗的恐惧,都是“关于”的——也就是说,它们的意向对象是那件衬衫、那个日期的派对和狗。事实上,以这种方式拥有一个对象是解释意向性的另一种方式;这种“拥有”似乎是一种关系,即状态与其对象之间的关系。
Intentionality is peculiar and perplexing. It looks on the face of it to be a relation between two things. My belief that Cairo is hot is intentional because it is about Cairo (and/or its being hot). That which an intentional act or state is about (Cairo or its being hot, say) is called its intentional object. (It is this intentional object that the intentional state “stretches toward”.) Likewise, my desire for a certain shirt, my imagining a party on a certain date, my fear of dogs in general, would be “about”—that is, have as their intentional objects—that shirt, a party on that date, and dogs in general. Indeed, having an object in this way is another way of explaining intentionality; and such “having” seems to be a relation, namely between the state and its object.
但如果这是一种关系,那它就是一种与众不同的关系。在...之内是一种典型的关系。现在请注意这一点:如果一件事物位于另一件事物之内是一个事实,那么不仅第一件事物,而且第二件事物也必须存在;如果 Y 不存在,X 就不可能位于 Y 之内,或者实际上以任何其他方式与 Y 相关。这对于关系来说通常是正确的;但对于意向性来说却不正确。我完全可以想象在某个日期举行的聚会,并且对它有信念、愿望和恐惧,即使没有(过去没有,将来也不会有)这样的聚会。当然,那些信念是错误的,那些希望和恐惧也无法实现;但它们仍然是有意的——关于或“拥有”那些对象。
But, if it’s a relation, it’s a relation like no other. Being-inside-of is a typical relation. Now notice this: if it is a fact about one thing that it is inside of another, then not only that first thing, but also the second has to exist; X cannot be inside of Y, or indeed be related to Y in any other way, if Y does not exist. This is true of relations quite generally; but it is not true of intentionality. I can perfectly well imagine a party on a certain date, and also have beliefs, desires, and fears about it, even though there is (was, will be) no such party. Of course, those beliefs would be false, and those hopes and fears unfulfilled; but they would be intentional—be about, or “have”, those objects—all the same.
正是这种将某物作为对象(无论该物是否实际存在)的令人费解的能力引起了布伦塔诺的注意。布伦塔诺不是唯物主义者:他认为精神现象是一种实体,而物质或物理现象则完全不同。如果后者不存在,他无法理解任何纯粹物质或物理的东西如何与另一个东西相关联;然而,每种精神状态(信仰、欲望等)都有这种可能性。因此,意向性是精神的明确标志。
It is this puzzling ability to have something as an object, whether or not that something actually exists, that caught Brentano’s attention. Brentano was no materialist: he thought that mental phenomena were one kind of entity, and material or physical phenomena were a completely different kind. And he could not see how any merely material or physical thing could be in fact related to another, if the latter didn’t exist; yet every mental state (belief, desire, and so on) has this possibility. So intentionality is the definitive mark of the mental.
丹尼尔·C·丹尼特接受了布伦塔诺对精神的定义,但提出了一种唯物主义的方式来看待意向性。丹尼特和图灵一样,认为智能是一个系统如何行为的问题;但与图灵不同的是,他还对是什么让(某些)行为变得智能——或者,用布伦塔诺的话来说,让它成为具有意向(即精神)状态的系统的行为——有一个详尽的解释。这个想法有两个部分:(i)行为不应孤立地理解,而应在上下文中理解,并作为一致行为模式的一部分(这通常被称为“整体论”);(ii)对于某些系统,上下文中的一致行为模式可以被解释为合理的(这种解释通常被称为“解释”)。1
Daniel C. Dennett accepts Brentano’s definition of the mental, but proposes a materialist way to view intentionality. Dennett, like Turing, thinks intelligence is a matter of how a system behaves; but, unlike Turing, he also has a worked-out account of what it is about (some) behavior that makes it intelligent—or, in Brentano’s terms, makes it the behavior of a system with intentional (that is, mental) states. The idea has two parts: (i) behavior should be understood not in isolation but in context and as part of a consistent pattern of behavior (this is often called “holism”); and (ii) for some systems, a consistent pattern of behavior in context can be construed as rational (such construing is often called “interpretation”).1
这里的理性意味着:在你所了解的和所能说出的有关你的情况的情况下,采取行动,尽力满足你的整体目标。在这一约束之下,我们可以通过观察系统的行为来推测系统想要什么和相信什么——但当然,这不是孤立的。从你孤立地看,单一行为可能体现出任意数量的不同信念和/或愿望,或者根本没有。只有当你看到一种一致的理性行为模式,以各种组合反复体现相同的认知状态和能力时,你才有理由说这些就是这个系统所具有的状态和能力——或者它根本没有任何认知状态或能力。丹尼特 (1971, p. 19) 说:“理性是意图之母。”
Rationality here means: acting so as best to satisfy your goals overall, given what you know and can tell about your situation. Subject to this constraint, we can surmise what a system wants and believes by watching what it does—but, of course, not in isolation. From all you can tell in isolation, a single bit of behavior might be manifesting any number of different beliefs and/or desires, or none at all. Only when you see a consistent pattern of rational behavior, manifesting the same cognitive states and capacities repeatedly, in various combinations, are you justified in saying that those are the states and capacities that this system has-or even that it has any cognitive states or capacities at all. “Rationality”, Dennett says (1971, p. 19), “is the mother of intention.”
这是上述关于观点的一个典型例子。根据丹尼特的说法,是否可以正确地将某物视为具有意向状态的限制因素不是它的形状或成分,而是它所做的事情 — — 更具体地说,是它所做事情的一致理性模式。我们推断,兔子可以分辨出狐狸和另一只兔子,它总是想远离这只兔子而不想远离另一只兔子,因为我们在各种条件下一次又一次地观察到兔子有相应的行为。因此,在特定场合,我们不仅根据兔子当前的行为,还根据其随时间推移的行为模式,将兔子对某只狐狸的意向状态 (信念和愿望)归因于它。一致的模式为相应的单独归因提供了特异性和可信度。
This is a prime example of the above point about perspective. The constraint on whether something can rightly be regarded as having intentional states is, according to Dennett, not its shape or what it is made of, but rather what it does—more specifically, a consistently rational pattern in what it does. We infer that a rabbit can tell a fox from another rabbit, always wanting to get away from the one but not the other, from having observed it behave accordingly time and again, under various conditions. Thus, on a given occasion, we impute to the rabbit intentional states (beliefs and desires) about a particular fox, on the basis not only of its current behavior but also of the pattern in its behavior over time. The consistent pattern lends both specificity and credibility to the respective individual attributions.
丹尼特将这种观点称为意向性立场,将被视为意向性的实体称为意向系统。如果这种立场要对任何特定情况产生说服力,那么它所依赖的模式最好是广泛而可靠的,但不必是完美的。比较一下晶体:如果样品真的是晶体,那么原子晶格中的模式最好是广泛而可靠的,但不必是完美的。事实上,晶体中缺陷的概念可以通过周围模式的规律性来理解;只有在大多数晶格都是规则的的情况下,特定部分才能被视为以确定的方式有缺陷。意向性立场也是如此:只是因为兔子几乎总是表现得理性,我们才会在某个特定场合说它碰巧错了— —比如说,把另一只兔子(或一片灌木、一个影子)误认为是狐狸。错误的信念和未实现的希望可以理解为整体一致模式中的孤立失误,就像晶体中的缺陷一样。这就是如何正确归因于特定的意向状态,即使其假定的意向对象并不存在——这也是丹尼特对布伦塔诺之谜的回答。
Dennett calls this perspective the intentional stance and the entities so regarded intentional systems. If the stance is to have any conviction in any particular case, the pattern on which it depends had better be broad and reliable; but it needn’t be perfect. Compare a crystal: the pattern in the atomic lattice had better be broad and reliable, if the sample is to be a crystal at all; but it needn’t be perfect. Indeed, the very idea of a flaw in a crystal is made intelligible by the regularity of the pattern around it; only insofar as most of the lattice is regular, can particular parts be deemed flawed in determinate ways. Likewise for the intentional stance: only because the rabbit behaves rationally almost always, could we ever say on a particular occasion that it happened to be wrong—had mistaken another rabbit (or a bush, or a shadow) for a fox, say. False beliefs and unfulfilled hopes are intelligible as isolated lapses in an overall consistent pattern, like flaws in a crystal. This is how a specific intentional state can rightly be attributed, even though its supposed intentional object doesn’t exist—and thus is Dennett’s answer to Brentano’s puzzle.
然而,许多非意向系统的物质事物也“关于”其他事物 — — 有时包括不存在的事物。例如,书面句子和故事在某种意义上是物质的;但它们往往是关于虚构的人物和事件的。甚至图片和地图也可以代表不存在的场景和地点。当然,布伦塔诺知道这一点,丹尼特也知道。但他们可以说,这种意向性仅仅是衍生的。这个想法是这样的:句子铭文 — — 比如页面上的墨迹 — — 只是因为我们(或其他智能用户)是那样想的才“关于”任何事情。它们的意向性是二手的,借用或衍生自那些用户已有的意向性。
Many material things that aren’t intentional systems are nevertheless “about” other things—including, sometimes, things that don’t exist. Written sentences and stories, for instance, are in some sense material; yet they are often about fictional characters and events. Even pictures and maps can represent nonexistent scenes and places. Of course, Brentano knew this, and so does Dennett. But they can say that this sort of intentionality is only derivative. Here’s the idea: sentence inscriptions—ink marks on a page, say—are only “about” anything because we (or other intelligent users) mean them that way. Their intentionality is second-hand, borrowed or derived from the intentionality that those users already have.
因此,像“圣诞老人住在北极”这样的句子,或者一张他的照片或一张他的旅行地图,可以“关于”圣诞老人(可惜他并不存在),但这只是因为我们可以认为他住在那里,想象他长什么样子,他去哪里。这些物品实际上具有我们的间接意向性,因为我们用它们来表达它。另一方面,我们的意向性本身不能同样是衍生性的:它必须是原创的(这里的“原创”只是意味着不是衍生性的,不是从别处借来的。如果有任何意向性,至少其中的一部分必须是原创的;它不能全部都是衍生性的。)
So, a sentence like “Santa lives at the North Pole”, or a picture of him or a map of his travels, can be “about” Santa (who, alas, doesn’t exist), but only because we can think that he lives there, and imagine what he looks like and where he goes. It’s really our intentionality that these artifacts have, second-hand, because we use them to express it. Our intentionality itself, on the other hand, cannot be likewise derivative: it must be original (‘Original’, here, just means not derivative, not borrowed from somewhere else. If there is any intentionality at all, at least some of it must be original; it can’t all be derivative.)
思维设计的问题在于,人工智能系统就像句子和图片一样,也是人工制品。因此,它们的意向性似乎也一定是衍生性的——大概是从设计者或用户那里借来的——而不是原创的。然而,如果设计和构建一个拥有自己思维的系统的项目真的要成功,那么人工智能系统就必须像我们一样拥有真正的原创意向性。这可能吗?
The problem for mind design is that artificial intelligence systems, like sentences and pictures, are also artifacts. So it can seem that their intentionality too must always be derivative—borrowed from their designers or users, presumably—and never original. Yet, if the project of designing and building a system with a mind of its own is ever really to succeed, then it must be possible for an artificial system to have genuine original intentionality, just as we do. Is that possible?
再想想人和句子,它们分别具有原始意向性和派生意向性。造成这种差异的原因是什么?是不是因为句子是人工制品,而人不是,或者可能是别的原因?还有另一个原因。句子不会对其含义做任何事情:它们从不追求目标、得出结论、制定计划、回答问题,更不用说关心它们对世界的看法是对是错——它们只是呆在那里,完全无动于衷、漫不经心。相比之下,人则依靠自己的信仰和愿望来做出明智的选择和有效行动;而这反过来又需要不断关注这些信仰是否真实、这些目标是否真正有益等等。换句话说,真正的信仰和愿望与理性、积极的存在密不可分,并与环境进行智能互动。也许这种积极、理性的参与与意向性是否原始更相关,而不是自然或人工起源的问题。
Think again about people and sentences, with their original and derivative intentionality, respectively. What’s the reason for that difference? Is it really that sentences are artifacts, whereas people are not, or might it be something else? Here’s another candidate. Sentences don’t do anything with what they mean: they never pursue goals, draw conclusions, make plans, answer questions, let alone care whether they are right or wrong about the world—they just sit there, utterly inert and heedless. A person, by contrast, relies on what he or she believes and wants in order to make sensible choices and act efficiently; and this entails, in turn, an ongoing concern about whether those beliefs are really true, those goals really beneficial, and so on. In other words, real beliefs and desires are integrally involved in a rational, active existence, intelligently engaged with its environment. Maybe this active, rational engagement is more pertinent to whether the intentionality is original or not than is any question of natural or artificial origin.
显然,这就是丹尼特方法所暗示的。在他看来,一个有意图的系统就是一个表现出适当模式的、始终如一的理性行为的系统——即积极地参与世界。如果可以制造出一个人工系统,它以理性的方式自行行动,足够一致,并且在各种适当的情况下(记住,它不必完美无缺),那么它就具有原始意向性——它有自己的想法,就像我们一样。
Clearly, this is what Dennett’s approach implies. An intentional system, by his lights, is just one that exhibits an appropriate pattern of consistently rational behavior—that is, active engagement with the world. If an artificial system can be produced that behaves on its own in a rational manner, consistently enough and in a suitable variety of circumstances (remember, it doesn’t have to be flawless), then it has original intentionality—it has a mind of its own, just as we do.
另一方面,丹尼特的论述完全没有提到如何设计和构建这样的系统,甚至没有提到是否真的可以设计和构建这样的系统。丹尼特认为,意向性完全取决于系统行为中的某种模式;内部结构和机制(如果有的话)与此无关。然而,对于科学的思维设计而言,它实际上如何运作(以及它如何构建)的问题绝对是核心问题——这就引出了计算机。
On the other hand, Dennett’s account is completely silent about how, or even whether, such a system could actually be designed and built. Intentionality, according to Dennett, depends entirely and exclusively on a certain sort of pattern in a system’s behavior; internal structure and mechanism (if any) are quite beside the point. For scientific mind design, however, the question of how it actually works (and so, how it could be built) is absolutely central—and that brings us to computers.
计算机对科学思维设计的重要性体现在两个根本不同的方面。第一个方面是很久以前图灵的灵感来源,也是最近许多其他科学家的灵感来源。但第二个方面才是真正推动人工智能发展并给人工智能带来成功希望的因素。为了理解这些各自的角色以及它们之间的区别,首先必须从根本上理解“计算机”的概念。
Computers are important to scientific mind design in two fundamentally different ways. The first is what inspired Turing long ago, and a number of other scientists much more recently. But the second is what really launched AI and gave it its first serious hope of success. In order to understand these respective roles, and how they differ, it will first be necessary to grasp the notion of ‘computer’ at an essential level.
形式系统就像一场游戏,游戏中根据确定的规则操纵令牌,以查看可以获得什么样的配置。事实上,许多熟悉的游戏(其中包括国际象棋、跳棋、井字游戏和围棋)都是形式系统。但也有许多游戏不是形式系统,许多形式系统不是游戏。前者包括弹珠、弹子游戏、台球和棒球等游戏;后者包括逻辑学家、计算机科学家和语言学家研究的许多系统。
A formal system is like a game in which tokens are manipulated according to definite rules, in order to see what configurations can be obtained. In fact, many familiar games—among them chess, checkers, tic-tac-toe, and go—simply are formal systems. But there are also many games that are not formal systems, and many formal systems that are not games. Among the former are games like marbles, tiddlywinks, billiards, and baseball; and among the latter are a number of systems studied by logicians, computer scientists, and linguists.
这里并不是尝试对形式系统进行完整定义的地方;但三个基本特征可以概括基本思想:(i) 它们是(如上所述)令牌操纵系统;(ii) 它们是数字化的;(iii) 它们是独立于介质的。值得花点时间详细说明一下这些含义。
This is not the place to attempt a full definition of formal systems; but three essential features can capture the basic idea: (i) they are (as indicated above) token-manipulation systems; (ii) they are digital; and (iii) they are medium independent. It will be worth a moment to spell out what each of these means.
令牌操纵系统。说一个形式系统是一个令牌操纵系统,就是说你可以通过指定三件事来完整地定义它:
TOKEN-MANIPULATION SYSTEMS. To say that a formal system is a token-manipulation system is to say that you can define it completely by specifying three things:
(1) 一组类型的正式令牌或棋子;
(1) a set of types of formal tokens or pieces;
(2) 一个或多个允许的起始位置,即这些类型的 token 的初始形式排列;
(2) one or more allowable starting positions—that is, initial formal arrangements of tokens of these types; and
(3)一套正式规则,规定这些正式安排如何可以或必须转变为其他安排。
(3) a set of formal rules specifying how such formal arrangements may or must be changed into others.
此定义旨在暗示令牌操作系统是完全独立的。具体而言,规则的形式性有两方面:(i)它们仅指定允许的令牌的下一个形式排列,以及(ii)它们仅根据当前的形式排列来指定这些排列 — 任何其他内容都与形式无关。
This definition is meant to imply that token-manipulation systems are entirely self -contained. In particular, the formality of the rules is twofold: (i) they specify only the allowable next formal arrangements of tokens, and (ii) they specify these in terms only of the current formal arrangement—nothing else is formally relevant at all.
以国际象棋为例。国际象棋有 12 种棋子,每种颜色 6 种。只有一个允许的起始位置,即在一个 8×8 的方格阵列上以某种方式放置这 12 种棋子的 32 个。规定位置如何变化的规则只是规定棋子如何移动、消失(被捕获)或改变类型(升级)的规则。(在国际象棋中,永远不会在位置上添加新棋子;但在其他正式游戏中,这是另一种走法——比如围棋。)最后,请注意,国际象棋是完全独立的:除了当前棋局本身之外,没有任何东西与哪些走法是合法的有关。2
So take chess, for example. There are twelve types of piece, six of each color. There is only one allowable starting position, namely one in which thirty-two pieces of those twelve types are placed in a certain way on an eight-by-eight array of squares. The rules specifying how the positions change are simply the rules specifying how the pieces move, disappear (get captured), or change type (get promoted). (In chess, new pieces are never added to the position; but that’s a further kind of move in other formal games—such as go.) Finally, notice that chess is entirely self-contained: nothing is ever relevant to what moves would be legal other than the current chess position itself.2
每位形式逻辑的学生都熟悉至少一种逻辑系统,即符号操纵游戏。它有一条明显的发展路径(还有很多其他路径):逻辑符号的种类是类型,而你在纸上实际做的标记是这些类型的符号;允许的起始位置是一组格式正确的公式(作为前提);形式规则是推理规则,指定形式有效推理中的步骤——即你写下并添加到当前位置的进一步公式。当然,这被称为形式逻辑并非偶然。
And every student of formal logic is familiar with at least one logical system as a token-manipulation game. Here’s one obvious way it can go (there are many others): the kinds of logical symbol are the types, and the marks that you actually make on paper are the tokens of those types; the allowable starting positions are sets of well-formed formulae (taken as premises); and the formal rules are the inference rules specifying steps—that is, further formulae that you write down and add to the current position—in formally valid inferences. The fact that this is called formal logic is, of course, no accident.
数字系统。数字化是某些技术(方法、设备)的特性,用于制造物品,然后(稍后)识别所制造的物品。这种技术的一个常见示例是写下某物,然后阅读它。所写或所制造的物品应该属于指定类型(来自一组可能的类型),稍后识别它就是告诉它是什么类型。所以也许你应该写下字母表中的指定字母;然后我的工作是根据你制作的东西告诉你应该写哪些字母。那么问题是:我能做得多好?后期识别在恢复先前的规范方面有多好?
DIGITAL SYSTEMS. Digitalness is a characteristic of certain techniques (methods, devices) for making things, and then (later) identifying what was made. A familiar example of such a technique is writing something down and later reading it. The thing written or made is supposed to be of a specified type (from some set of possible types), and identifying it later is telling what type that was. So maybe you’re supposed to write down specified letters of the alphabet; and then my job is to tell, on the basis of what you produce, which letters you were supposed to write. Then the question is: how well can I do that? How good are the later identifications at recovering the prior specifications?
如果这种技术是积极可靠的,那么它就是数字化的。如果重新识别可以绝对完美,那么它就是积极的。如果积极的技术不仅可以完美,而且几乎总是完美的,那么它就是可靠的。这值得深思。我们习惯于这样的想法:没有什么——至少,没有什么平凡和现实世界的东西——是完美的。完美是一种理想,在实践中永远无法完全实现。然而,“数字化”的定义要求完美不仅是可能的,而且是可以可靠地实现的。
Such a technique is digital if it is positive and reliable. It is positive if the reidentification can be absolutely perfect. A positive technique is reliable if it not only can be perfect, but almost always is. This bears some thought. We’re accustomed to the idea that nothing—at least, nothing mundane and real-worldly—is ever quite perfect. Perfection is an ideal, never fully attainable in practice. Yet the definition of ‘digital’ requires that perfection be not only possible, but reliably achievable.
一切都取决于什么才算是成功。比较两个任务,每个任务都涉及一枚硬币和一个八英寸长的棋盘。第一个任务要求你将硬币放在距离棋盘最近边缘精确0.43747 英寸、距离左边 0.18761 英寸的位置;第二个任务要求你将它放在第四行(行)和第二行(左边一列)的某个位置。当然,完成第一个任务也就完成了第二个任务。但第一个任务是严格不可能的 - 也就是说,它永远无法真正实现,最多只能近似实现。另一方面,第二个任务实际上可以绝对完美地完成- 它甚至不难。原因很容易看出:任何数量的略有不同的实际位置都同样可以算作完全成功 - 因为硬币只需在指定的方格内某个地方。
Everything turns on what counts as success. Compare two tasks, each involving a penny and an eight-inch checkerboard. The first asks you to place the penny exactly 0.43747 inches in from the nearest edge of the board, and 0.18761 inches from the left; the second asks you to put it somewhere in the fourth rank (row) and the second file (column from the left). Of course, achieving the first would also achieve the second. But the first task is strictly impossible—that is, it can never actually be achieved, but at best approximated. The second task, on the other hand, can in fact be carried out absolutely perfectly—it’s not even hard. And the reason is easy to see: any number of slightly different actual positions would equally well count as complete success—because the penny only has to be somewhere within the specified square.
国际象棋是数字化的:如果一个玩家确定了一个棋位(或走法),那么另一个玩家就可以准确、完美地识别出来。国际象棋的走法和走法就像是用硬币完成的第二项任务:从国际象棋的角度来看,棋子位置(也就是棋子的位置)的细微差别根本不是差别。跳棋、围棋和井字游戏在这方面就像国际象棋,但棒球和台球则不是。与前者不同,后者中,某个物理对象的精确位置、速度、平滑度、弹性或其他方面的任意微小差别都会对游戏产生重大影响。数字系统虽然是具体的和物质的,但却不受这种物理变化的影响。
Chess is digital: if one player produces a chess position (or move), then the other player can reliably identify it perfectly. Chess positions and moves are like the second task with the penny: slight differences in the physical locations of the figurines aren’t differences at all from the chess point of view—that is, in the positions of the chess pieces. Checkers, go, and tic-tac-toe are like chess in this way, but baseball and billiards are not. In the latter, unlike the former, arbitrarily small differences in the exact position, velocity, smoothness, elasticity, or whatever, of some physical object can make a significant difference to the game. Digital systems, though concrete and material, are insulated from such physical vicissitudes.
媒介独立性。如果一个具体系统不依赖于它由什么物理“媒介”构成或实现,那么它就是媒介独立性的。当然,它必须在某种东西中实现;而且,这种东西必须支持这种系统所需的任何结构或形式。但是,除了这个一般的先决条件之外,关于媒介的任何具体内容都不重要(也许除了出于无关紧要的方便原因)。从这个意义上讲,只有形式系统的形式才是重要的,而不是其物质。
MEDIUM INDEPENDENCE. A concrete system is medium independent if what it is does not depend on what physical “medium” it is made of or implemented in. Of course, it has to be implemented in something; and, moreover, that something has to support whatever structure or form is necessary for the kind of system in question. But, apart from this generic prerequisite, nothing specific about the medium matters (except, perhaps, for extraneous reasons of convenience). In this sense, only the form of a formal system is significant, not its matter.
例如,国际象棋是介质独立的。棋子可以由木头、塑料、象牙、玛瑙或任何你想要的材料制成,只要它们足够稳定(不会融化或爬行)并且可由玩家移动。你可以用视频屏幕上的光图案、用沙子上画的符号下棋,甚至——如果你足够富有和古怪——用无线电控制的直升机舰队下棋。但你不能用活青蛙(它们不会静止不动)、水中描绘的形状(它们不会持续)或山顶(没有人能移动它们)下棋。关于逻辑符号和所有其他形式系统,可以提出本质上类似的观点。
Chess, for instance, is medium independent. Chess pieces can be made of wood, plastic, ivory, onyx, or whatever you want, just as long as they are sufficiently stable (they don’t melt or crawl around) and are movable by the players. You can play chess with patterns of light on a video screen, with symbols drawn in the sand, or even—if you’re rich and eccentric enough—with fleets of helicopters operated by radio control. But you can’t play chess with live frogs (they won’t sit still), shapes traced in the water (they won’t last), or mountain tops (nobody can move them). Essentially similar points can be made about logical symbolism and all other formal systems.
相比之下,生火、喂饱家人或连接电路所需的材料都与介质有关,因为某种东西是否易燃、可食用或导电不仅取决于其形状,还取决于其材质。台球或棒球也与介质有关:球(以及球拍和比赛场地)的材质非常重要,而且受到严格监管。台球确实可以由象牙或(某些特殊)塑料制成,但几乎不能用木头或玛瑙制成。而且,为了保命,你不可能在玩台球或棒球时使用直升机或沙中的形状。原因是,与国际象棋和其他形式系统不同,在这些游戏中,球与其他设备的物理相互作用的细节会产生重要影响:它们如何弹跳、摩擦力有多大、让它们飞过一定距离需要多少能量,等等。
By contrast, what you can light a fire, feed a family, or wire a circuit with is not medium independent, because whether something is flammable, edible, or electrically conductive depends not just on its form but also on what it’s made of. Nor are billiards or baseball independent of their media: what the balls (and bats and playing surfaces) are made of is quite important and carefully regulated. Billiard balls can indeed be made either of ivory or of (certain special) plastics, but hardly of wood or onyx. And you couldn’t play billiards or baseball with helicopters or shapes in the sand to save your life. The reason is that, unlike chess and other formal systems, in these games the details of the physical interactions of the balls and other equipment make an important difference: how they bounce, how much friction there is, how much energy it takes to make them go a certain distance, and so on.
自动形式系统是一种可以自行“移动”的形式系统。更准确地说,它是一种物理设备或机器,具有以下特点:
An automatic formal system is a formal system that “moves” by itself. More precisely, it is a physical device or machine such that:
(1)其部分或状态的某些配置可以看作是某个形式系统的标记和位置;
(1) some configurations of its parts or states can be regarded as the tokens and positions of some formal system; and
(2)在正常运行中,它会根据该系统的规则自动操纵这些代币。
(2) in its normal operation, it automatically manipulates these tokens in accord with the rules of that system.
因此,它就像一组在棋盘上跳跃的棋子,独自遵守规则,或者像一支神奇的铅笔,无需任何逻辑学家的指导就能写出形式正确的逻辑推导。
So it’s like a set of chess pieces that hop around the board, abiding by the rules, all by themselves, or like a magical pencil that writes out formally correct logical derivations, without the guidance of any logician.
当然,从形式上看,这正是计算机的本质。但是,如果我们要正确理解它们对心智设计的重要性,一些基本事实和特征需要进一步阐述——其中包括实施和通用性、算法和启发式程序以及数字模拟的概念。
Of course, this is exactly what computers are, seen from a formal perspective. But, if we are to appreciate properly their importance for mind design, several fundamental facts and features will need further elaboration—among them the notions of implementation and universality, algorithmic and heuristic procedures, and digital simulation.
实现和通用性。计算机科学最基本的思想可能是,你可以使用一个自动形式系统来实现另一个。这就是编程。你不是用硬件构建某种特殊的计算机,而是用软件构建它;也就是说,你为“通用”计算机(你已经拥有了)编写一个程序,让它完全像你需要的特殊计算机一样运行。当一台计算机实现另一台计算机时:
IMPLEMENTATION AND UNIVERSALITY. Perhaps the most basic idea of computer science is that you can use one automatic formal system to implement another. This is what programming is. Instead of building some special computer out of hard are, you build it out of software; that is, you write a program for a “general purpose” computer (which you already have) that will make it act exactly as if it were the special computer that you need. One computer so implements another when:
(1)前者的某些 token 和位置的配置可以看作是后者的 token 和位置;
(1) some configurations of tokens and positions of the former can be regarded as the tokens and positions of the latter; and
(2)前者遵循自己的规则,因此它会自动按照后者的规则来操作后者的那些标记。
(2) as the former follows its own rules, it automatically manipulates those tokens of the latter in accord with the latter’s rules.
一般来说,那些被视为特殊计算机的令牌和位置的配置本身只是通用计算机令牌和位置的一小部分。其余部分(可能是大多数)是程序。通用计算机对其所有令牌都遵循自己的规则;但程序令牌的排列方式是,最终效果是按照其规则所要求的方式操纵实现特殊计算机令牌的配置。
In general, those configurations that are being regarded as tokens and positions of the special computer are themselves only a fraction of the tokens and positions of the general computer. The remainder (which may be the majority) are the program. The general computer follows its own rules with regard to all of its tokens; but the program tokens are so arranged that the net effect is to manipulate the configurations implementing the tokens of the special computer in exactly the way required by its rules.
描述起来很复杂,更别说实际实现;而且问题在于,从理论上讲,这种实现有多大可能。答案既令人惊讶,又意义重大。1937 年,AM Turing(就是我们在讨论智能时遇到的那个 Turing)实际上表明,这总是可能的。更确切地说,他表明,有些计算机器(他称之为通用计算机器)可以实现任何定义明确的自动形式系统,只要它们有足够的存储容量和时间。不仅如此,他还展示了通用机器可以非常简单;并且他给出了一个完整的设计规范。
This is complicated to describe, never mind actually to achieve; and the question arises how often such implementation is possible in principle. The answer is as surprising as it is consequential. In 1937, A. M. Turing—the same Turing we met earlier in our discussion of intelligence—showed, in effect, that it is always possible. Put somewhat more carefully, he showed that there are some computing machines—which he called universal machines—that can implement any welldefined automatic formal system whatsoever, provided only that they have enough storage capacity and time. Not only that, he showed also that universal machines can be amazingly simple; and he gave a complete design specification for one.
从图灵的角度来看,每台普通(可编程)计算机都是通用机器。换句话说,只要有正确的程序和足够的内存,你桌上的计算机就可以在除速度之外的所有方面与任何可能的计算机相当。原则上,任何计算机能做的事情,你的计算机也能做。事实上,你桌上的机器可以(通常就是)同时是多台计算机。从一个角度来看,它是一台“硬件”计算机,根据严格的形式规则修改通常称为“位”的微小电压标记的复杂模式。从另一个角度来看,它同时是一个完全不同的系统,可以转换称为“操作码”、“数据”和“地址”的机器语言单词。而且,根据你要做的工作,它也可能是一个文字处理器、拼写检查器、宏解释器等。
Every ordinary (programmable) computer is a universal machine in Turing’s sense. In other words, the computer on your desk, given the right program and enough memory, could be made equivalent to any computer that is possible at all, in every respect except speed. Anything any computer can do, yours can too, in principle. Indeed, the machine on your desk can be (and usually is) lots of computers at once. From one point of view, it is a “hardware” computer modifying, according to strict formal rules, complex patterns of tiny voltage tokens often called “bits”. Viewed another way, it is simultaneously a completely different system that shuffles machine-language words called “op-codes”, “data” and “addresses”. And, depending on what you’re up to, it may also be a word processor, a spell checker, a macro interpreter, and/or whatever.
A算法和启发式方法。通常,特定计算机是为特定目的而设计和制造的(或编程的):将有一些复杂的标记重新排列,自动实现这些重新排列很有价值。通常,设计师使用可以轻松执行简单重新排列的设备,而工作是找到它们的组合(通常是一系列步骤),以共同实现所需的结果。现在有两种基本情况,主要取决于分配任务的性质。
ALGORITHMS AND HEURISTICS. Often a specific computer is designed and built (or programed) for a particular purpose: there will be some complicated rearrangement of tokens that it would be valuable to bring about automatically. Typically, a designer works with facilities that can carry out simple rearrangements easily, and the job is to find a combination of them (usually a sequence of steps) that will collectively achieve the desired result. Now there are two basic kinds of case, depending mainly on the character of the assigned task.
在许多情况下,设计者能够实现一个保证始终有效的程序,即无论输入如何,都能在有限的时间内实现所需的重新排列。例如,假设输入始终是一串英文单词,而所需的重新排列是按字母顺序排列它们。已知有程序可以保证在有限的时间内按字母顺序排列任何给定的列表。这种在有限时间内一定能成功的程序称为算法。许多重要的计算问题都可以通过算法解决。
In many cases, the designer is able to implement a procedure that is guaranteed always to work—that is, to effect the desired rearrangement, regardless of the input, in a finite amount of time. Suppose, for instance, that the input is always a list of English words, and the desired rearrangement is to put them in alphabetical order. There are known procedures that are guaranteed to alphabetize any given list in finite time. Such procedures, ones that are sure to succeed in finite time, are called algorithms. Many important computational problems can be solved algorithmically.
但出于理论或实际原因,许多其他人无法做到这一点。例如,任务可能是在任何给定的棋局中找到最佳走法。从技术上讲,国际象棋是有限的;因此,从理论上讲,可以检查每一种可能走法的所有可能结果,从而在完整信息的基础上做出完美选择。但事实上,即使整个地球是一台用目前最先进的技术制造的巨型计算机,它也无法在太阳系的生命周期中解决这个问题。因此,用蛮力下国际象棋是不切实际的。但显然,这并不意味着机器无法想出好的国际象棋走法。它们是如何做到这一点的?
But many others cannot, for theoretical or practical reasons. The task, for instance, might be to find the optimal move in any given chess position. Technically, chess is finite; so, theoretically, it would be possible to check every possible outcome of every possible move, and thus choose flawlessly, on the basis of complete information. But, in fact, even if the entire planet Earth were one huge computer built with the best current technology, it could not solve this problem even once in the life of the Solar System. So chess by brute force is impractical. But that, obviously, does not mean that machines can’t come up with good chess moves. How do they do that?
它们依赖于一般的估计和经验法则:虽然不能保证每次都能给出正确答案,但大多数时候都相当可靠。这样的程序被称为启发式方法。在国际象棋中,合理的启发式方法包括从各个方向展望几步,然后评估棋子的数量和种类等因素,机动性、控制中心、兵的协调性等等。这些并不是衡量棋局实力的万无一失的指标;但结合起来,它们可以相当不错。这就是下棋计算机的工作方式——以及许多其他处理没有已知算法解决方案的问题的机器的工作方式。
They rely on general estimates and rules of thumb: procedures that, while not guaranteed to give the right answer every time, are fairly reliable most of the time. Such procedures are called heuristics. In the case of chess, sensible heuristics involve looking ahead a few moves in various directions and then evaluating factors like number and kind of pieces, mobility, control of the center, pawn coordination, and so on. These are not infallible measures of the strength of chess positions; but, in combination, they can be pretty good. This is how chess-playing computers work—and likewise many other machines that deal with problems for which there are no known algorithmic solutions.
计算机上的启发式程序的可能性有时会令人困惑。从某种意义上说,每个数字计算(不咨询随机器)都是算法的;那么它们怎么可能是启发式的呢?答案又是一个视角问题。任何给定的程序是算法的还是启发式的取决于你如何描述任务。当被描述为计算棋子的数量和种类时,同一个程序可能是一种算法,但当被描述为估计位置的强度时,它只是一个启发式的经验法则。
The possibility of heuristic procedures on computers is sometimes confusing. In one sense, every digital computation (that does not consult a randomizer) is algorithmic; so how can any of them be heuristic? The answer is again a matter of perspective. Whether any given procedure is algorithmic or heuristic depends on how you describe the task. One and the same procedure can be an algorithm, when described as counting up the number and kinds of pieces, but a mere heuristic rule of thumb, when described as estimating the strength of a position.
这也是对另一个常见困惑的解答。人们常说计算机永远不会犯错(除非某个程序有 bug 或硬件出现故障)。然而,任何曾经与小型国际象棋计算机下过棋的人都知道,它会犯很多错误。但这与您描述任务的方式相同。即使是那个廉价玩具每次也在完美地执行实现其启发式算法的算法;从这个角度看,它永远不会犯错。只是那些启发式算法不是很复杂;因此,从棋手的角度来看,同一个系统会犯很多错误。
This is the resolution of another common confusion as well. It is often said that computers never make mistakes (unless there is a bug in some program or a hardware malfunction). Yet anybody who has ever played chess against a small chess computer knows that it makes plenty of mistakes. But this is just that same issue about how you describe the task. Even that cheap toy is executing the algorithms that implement its heuristics flawlessly every time; seen that way, it never makes a mistake. It’s just that those heuristics aren’t very sophisticated; so, seen as a chess player, the same system makes lots of mistakes.
数字模拟。计算机的一个重要实际应用实际上根本不是令牌操作,而是作为达到目的的一种手段。您总是在自己的计算机中看到这种情况。文字处理器和电子表格实际上使用数字令牌:字母和数字。但图像处理器不是:图片不是数字的。相反,众所周知,它们是“数字化的”。也就是说,它们被分成足够精细的点和层次,增量几乎不可察觉,结果看起来平滑而连续。尽管如此,计算机可以存储和修改它们,因为——重新描述——那些像素都只是数字。
DIGITAL SIMULATION. One important practical application of computers isn’t really token manipulation at all, except as a means to an end. You see this in your own computer all the time. Word processors and spreadsheets literally work with digital tokens: letters and numerals. But image processors do not: pictures are not digital. Rather, as everybody knows, they are “digitized”. That is, they are divided up into fine enough dots and gradations that the increments are barely perceptible, and the result looks smooth and continuous. Nevertheless, the computer can store and modify them because—redescribed—those pixels are all just digital numerals.
动态系统也可以实现同样的效果:系统的状态会随时间以规律的方式相互作用和变化。如果知道相关变量和关系,那么时间也可以被划分为小区间,并逐步计算系统的进展。这称为数字模拟。现实世界中最著名的例子是通过模拟地球大气层来预测天气。但工程师和科学家(包括我们将看到的许多认知科学家)始终依赖于非数字系统的数字模拟。
The same thing can be done with dynamic systems: systems whose states interact and change in regular ways over time. If the relevant variables and relationships are known, then time can be divided into small intervals too, and the progress of the system computed, step by tiny step. This is called digital simulation. The most famous real-world example of it is the massive effort to predict the weather by simulating the Earth’s atmosphere. But engineers and scientists—including, as we shall see, many cognitive scientists—rely on digital simulation of nondigital systems all the time.
图灵(1950 年 [本卷第 6 章],第 442 页 [109])曾预言——虽然我们现在知道这是错误的,但并不愚蠢——到 2000 年,将会出现能够通过他的智力测试的计算机。这是在人工智能领域尚未开展任何严肃的理论或实践研究之前。那么,他的预言是基于什么呢?他并没有真正说过(除了对当时计算机的存储容量的估计——相当低——之外。)但我想我们可以看到是什么促使了他这么做。
Turing (1950 [chapter 6 in this volume], 442 [109]) predicted—falsely, as we now know, but not foolishly—that by the year 2000 there would be computers that could pass his test for intelligence. This was before any serious work, theoretical or practical, had begun on artificial intelligence at all. On what, then, did he base his prediction? He doesn’t really say (apart from an estimate—quite low—of how much storage computers would then have). But I think we can see what moved him.
在图灵的测试中,唯一相关的输入和输出是单词——所有这些都是(除其他外)正式的标记。因此,要匹配的人类能力实际上是正式的输入/输出函数。但图灵本人在十三年前就已证明,只要拥有足够的内存和时间(或速度),任何来自某个非常广泛类别的正式输入/输出函数都可以在常规通用机器中实现——他认为,这些将在本世纪末实现。
In Turing’s test, the only relevant inputs and outputs are words—all of which are (among other things) formal tokens. So the capacity of human beings that is to be matched is effectively a formal input/output function. But Turing himself had shown, thirteen years earlier, that any formal input/output function from a certain very broad category could be implemented in a routine universal machine, provided only that it had enough memory and time (or speed)—and those, he thought, would be available by century’s end.
现在,即使抛开关于大小和速度的假设,这实际上也不是一个证明,因为图灵没有(也不能)证明人类的语言输入/输出功能属于他的定理所适用的广泛功能类别。但他有充分的理由相信任何数字机制可计算的功能都会属于这一类别;他确信人类身上没有任何非物质或超自然的东西。剩下的唯一选择似乎是非数字机制;他认为这些机制可以通过数字模拟。
Now, this isn’t really a proof, even setting aside the assumptions about size and speed, because Turing did not (and could not) show that the human verbal input/output function fell into that broad category of functions to which his theorem applied. But he had excellent reason to believe that any function computable by any digital mechanism would fall into that category; and he was convinced that there is nothing immaterial or supernatural in human beings. The only alternative remaining would seem to be nondigital mechanisms; and those he believed could be digitally simulated.
请注意,这个论点中没有提到心智如何实际运作,也没有提到心智的实际设计。它只是假设心智一定有某种(非魔法的)运作方式,而且无论那种方式是什么,计算机都可以实现或模拟它。另一方面,在随后的人工智能历史中,人们提出了许多关于人类(和/或其他)心智的实际设计的非常具体的建议。几乎所有这些建议都属于两大类中的一类:一类人认真对待心智本身本质上是一台(特定类型的)数字计算机这一观点,另一类人则拒绝这一观点。
Notice that there is nothing in this argument about how the mind might actually work—nothing about actual mind design. There’s just an assumption that there must be some (nonmagical) way that it works, and that, whatever that way is, a computer can either implement it or simulate it. In the subsequent history of artificial intelligence, on the other hand, a number of very concrete proposals have been made about the actual design of human (and/or other) minds. Almost all of these fall into one or the other of two broad groups: those that take seriously the idea that the mind itself is essentially a digital computer (of a particular sort), and those that reject that idea.
第一种方法是我所说的“传统人工智能”,即GOFAI。(有时也被称为“经典”或“符号操作”甚至“思想语言”人工智能。)GOFAI 传统研究从 50 年代中期到至少 80 年代中期占据了该领域的主导地位,原因很简单:它曾经是(现在仍然是)一种对智能机制的清晰阐述,既直观合理,又极易实现。根据这种观点,思维只是一台具有某些特殊特征的计算机,即具有可被视为明确思考或推理的内部状态和过程的计算机。为了理解 GOFAI 理念的巨大合理性和威力,我们需要了解如何正确地以这种方式看待计算机。
The first approach is what I call “good old-fashioned AI”, or GOFAI. (It is also sometimes called “classical” or “symbol-manipulation” or even “language-of-thought” Al.) Research in the GOFAI tradition dominated the field from the mid-fifties through at least the mid-eighties, and for a very good reason: it was (and still is) a well-articulated view of the mechanisms of intelligence that is both intuitively plausible and eminently realizable. According to this view, the mind just is a computer with certain special characteristics—namely, one with internal states and processes that can be regarded as explicit thinking or reasoning. In order to understand the immense plausibility and power of this GOFAI idea, we will need to see how a computer could properly be regarded in this way.
形式系统的概念最早出现在数学中,受到算术和代数的启发。当人们解决算术或代数问题时,他们会操纵 token根据一定的规则,有点像游戏。但是这些标记与棋盘上的棋子之间存在着深刻的区别:它们有某种含义。例如,数字表示数字(具体项目或抽象项目),而算术符号表示对这些数字的运算或关系。(以这种方式表示某种含义的标记通常称为符号。)相比之下,棋子、跳棋和围棋子则没有任何含义:它们根本不是符号,而只是正式的游戏标记。
The idea of a formal system emerged first in mathematics, and was inspired by arithmetic and algebra. When people solve arithmetic or algebraic problems, they manipulate tokens according to definite rules, sort of like a game. But there is a profound difference between these tokens and, say, the pieces on a chess board: they mean something. Numerals, for instance, represent numbers (either of specified items or in the abstract), while arithmetic signs represent operations on or relationships among those numbers. (Tokens that mean something in this way are often called symbols.) Chess pieces, checkers, and go stones, by contrast, represent nothing: they are not symbols at all, but merely formal game tokens.
数学系统中的标记的操作规则与这些标记的含义密切相关。一个简单的例子可以说明这一点。假设某人正在用字母表的前 15 个字母玩正式游戏。这个游戏的规则非常严格:每个起始位置都由一串以“A”结尾的字母组成(尽管并非每个这样的字符串都是合法的);并且,对于每个起始位置,只有一种合法的移动方式 - 即在“A”后附加一个特定的字母串(然后游戏结束)。问题是:这里发生了什么(如果有的话)?
The rules according to which the tokens in a mathematical system may be manipulated and what those tokens mean are closely related. A simple example will bring this out. Suppose someone is playing a formal game with the first fifteen letters of the alphabet. The rules of this game are very restrictive: every starting position consists of a string of letters ending in ‘A’ (though not every such string is legal); and, for each starting position, there is one and only one legal move—which is to append a particular string of letters after the ‘A’ (and then the game is over). The question is: What (if anything) is going on here?
假设你认为这些字母可能只是普通算术中熟悉的数字和符号的奇怪符号。然而,将 15 个字母转换成 15 个数字和符号的可能方法有超过一万亿种。你如何决定哪种方法(如果有的话)是“正确”的方法?表 2.1说明了这个问题。第一行给出了八个示例游戏,每个游戏都符合规则。接下来的三行分别给出了一个可能的翻译方案,并显示了根据该方案这八个示例将如何结果。
Suppose it occurs to you that the letters might be just an oddball notation for the familiar digits and signs of ordinary arithmetic. There are, however, over a trillion possible ways to translate fifteen letters into fifteen digits and signs. How could you decide which—if any—is the “right” way? The problem is illustrated in table 2.1. The first row gives eight sample games, each legal according to the rules. The next three rows each give a possible translation scheme, and show how the eight samples would come out according to that scheme.
表 2.1
字母游戏和三种不同的翻译方案。
Table 2.1
Letter game and three different translation schemes.
差异是显而易见的。第一种方案呈现的样本游戏虽然由数字和算术符号组成,但看起来并不比字母更像真正的算术——它们充其量只是“算术沙拉”。第二种方案乍一看更好:至少字符串具有方程式的形状。但是,仔细检查,将它们解释为方程式,它们都会是假的——非常假。事实上,虽然符号的位置合理,但数字与第一种情况一样随机“抛出”。相比之下,第三种方案产生的字符串不仅看起来像方程式,它们就是方程式——它们都是真的。这使得第三种方案看起来更容易接受。为什么?
The differences are conspicuous. The sample games as rendered by the first scheme, though consisting of digits and arithmetic signs, look no more like real arithmetic than the letters did—they’re “arithmetic salad” at best. The second scheme, at first glance, looks better: at least the strings have the shape of equations. But, on closer examination, construed as equations, they would all be false—wildly false. In fact, though the signs are plausibly placed, the digits are just as randomly “tossed” as the first case. The third scheme, by contrast, yields strings that not only look like equations, they are equations—they’re all true. And this makes that third scheme seem much more acceptable. Why?
考虑一个相关的问题:翻译一些用迄今未知的文字书写的古代文献。显然,如果某个古怪的翻译员提出一个方案,按照这个方案,文本会变成胡言乱语(如表格中的第一个),我们不会留下深刻印象。同样明显的是,如果它们看起来像句子,但却是疯狂的句子,我们也不会留下深刻印象:不仅是错误的,而且是零散的、愚蠢的谎言,彼此之间或其他任何事情都无关。另一方面,如果某个精心设计的系统方案在其中发现了我们从其他来源了解到的关于战争、技术、自然事实或其他任何事情的详细、合理的描述,那么我们就会信服。3但同样:为什么?
Consider a related problem: translating some ancient documents in a hitherto unknown script. Clearly, if some crank translator proposed a scheme according to which the texts came out gibberish (like the first one in the table) we would be unimpressed. Almost as obviously, we would be unimpressed if they came out looking like sentences, but loony ones: not just false, but scattered, silly falsehoods, unrelated to one another or to anything else. On the other hand, if some careful, systematic scheme finds in them detailed, sensible accounts of battles, technologies, facts of nature, or whatever, that we know about from other sources, then we will be convinced.3 But again: why?
翻译是解释的一种(见上文第 15 页)。翻译者不是说某个系统在想什么或要做什么,而是说一些标记(符号)串的含义。为了区分这两种解释,我们可以将前者称为有意解释,因为它赋予意向状态,而后者(翻译)赋予语义解释,因为它赋予意义(=语义)。
Translation is a species of interpretation (see p. 15 above). Instead of saying what some system thinks or is up to, a translator says what some strings of tokens (symbols) mean. To keep the two species distinct, we can call the former intentional interpretation, since it attributes intentional states, and the latter (translation) semantic interpretation, since it attributes meanings (= semantics).
和所有解释一样,翻译也是整体性的:不可能完全脱离上下文来解释一个简短的字符串。例如,在我们的算术示例中,合法游戏“HDJAN”碰巧在第二个方案和第三个方案中看起来都一样正确(分别为“2 × 4 = 8”和“8 − 6 = 2”)。但是,在第二个方案中,这显然只是孤立的巧合,而在第三个方案中,它是一致模式的一部分。在一组符号中寻找意义,就像在一组行为中寻找理性一样,就是在寻找某种一致、可靠的模式。
Like all interpretation, translation is holistic: it is impossible to interpret a brief string completely out of context. For instance, the legal game ‘HDJAN’ happens to come out looking just as true on the second as on the third scheme in our arithmetic example (‘2 × 4 = 8’ and ‘8 − 6 = 2’, respectively). But, in the case of the second scheme, this is obviously just an isolated coincidence, whereas, in the case of the third, it is part of a consistent pattern. Finding meaning in a body of symbols, like finding rationality in a body of behavior, is finding a certain kind of consistent, reliable pattern.
那么,是什么样的模式呢?意向性解释力求对一个系统或生物进行解释,使得它在特定情况下的想法和行为始终合理且明智。语义解释力求对一组符号进行解释,使得它们的意思(“说”)在特定情况下始终合理且明智。这就是为什么算术和古文字例子中的第三种方案是可接受的方案:它们是“理解”文本的方案,而这正是翻译所寻求的模式。我认为我们永远不会对“在特定情况下始终合理且明智”这样的短语有一个精确、明确的定义。但它肯定抓住了我们(以及图灵)所说的智能的大部分含义,无论是在行动上还是在表达上。
Well, what kind of pattern? Intentional interpretation seeks to construe a system or creature so that what it thinks and does turns out to be consistently reasonable and sensible, given its situation. Semantic interpretation seeks to construe a body of symbols so that what they mean (“say”) turns out to be consistently reasonable and sensible, given the situation. This is why the third schemes in both the arithmetic and ancient-script examples are the acceptable ones: they’re the ones that “make sense” of the texts, and that’s the kind of pattern that translation seeks. I don’t think we will ever have a precise, explicit definition of any phrase like “consistently reasonable and sensible, given the situation”. But surely it captures much of what we mean (and Turing meant) by intelligence, whether in action or in expression.
毋庸置疑,解释和自动化可以结合起来。例如,一个简单的计算器本质上是字母游戏示例的自动化版本,具有第三种解释。图灵设想的系统——一台输入和输出可以被理解为连贯的英语对话的计算机——将是一个解释型自动形式系统。但它不是GOFAI。
Needless to say, interpretation and automation can be combined. A simple calculator, for instance, is essentially an automated version of the letter-game example, with the third interpretation. And the system that Turing envisioned—a computer with inputs and outputs that could be understood as coherent conversation in English—would be an interpreted automatic formal system. But it’s not GOFAI.
到目前为止,我们已经考虑了输入和输出可以解释的系统。但我们没有关注这些系统内部发生了什么——它们如何从输入到适当的输出。对于一个简单的计算器来说,这没什么大不了的。但想象一下一个解决更难问题的系统——例如代数或物理课本中的“应用题”。这里的挑战不是进行计算,而是弄清楚要进行哪些计算。有很多可能的方法可以尝试,但只有其中一种或几种会奏效。
So far, we have considered systems the inputs and outputs of which can be interpreted. But we have paid no attention to what goes on inside of those systems—how they get from an input to an appropriate output. In the case of a simple calculator, there’s not much to it. But imagine a system that tackles harder problems—like “word problems” in an algebra or physics text, for instance. Here the challenge is not doing the calculations, but figuring out what calculations to do. There are many possible things to try, only one or a few of which will work.
当然,一个熟练的问题解决者不会随意尝试,而是会依靠经验和经验法则来指导下一步该做什么,以及事情进展如何(是继续、回溯、重新开始还是放弃才是最好的)。我们可以想象有人嘀咕:“如果我能得到那个,那么我就能搞定这个;但为了得到那个,我需要这样那样的东西。现在,让我看看……好吧,如果……会怎样? ”(等等)。这种精明、有条不紊的探索——既不是算法也不是随机的——是一种常见的清晰推理或思考问题的方式。
A skilled problem solver, of course, will not try things at random, but will rely on experience and rules of thumb for guidance about what to try next, and about how things are going so far (whether it would be best to continue, to back-track, to start over, or even to give up). We can imagine someone muttering: “If only I could get that, then I could nail this down; but, in order to get that, I would need such and such. Now, let me see…well, what if…” (and so on). Such canny, methodical exploration—neither algorithmic nor random—is a familiar sort of articulate reasoning or thinking a problem out.
但是,从形式的角度来看,这些步骤(猜测、部分结果、子目标、死胡同等等)中的每一个都只是另一个标记字符串。因此,它们很容易成为解释自动形式系统中的中间状态,该系统将问题陈述作为输入,并将解决方案陈述作为输出。那么这些中间字符串本身是否应该被解释为思考或推理问题的步骤?如果满足两个条件,那么这种情况就变得非常引人注目。首先,系统最好能够以相当的能力处理一系列开放式和多样化的问题,而不仅仅是少数几个问题(这些问题的解决方案可能已经“预先准备好”)。而且,它最好是通过这些步骤真正找到解决方案。(如果它真的以其他方式解决问题,然后再添加“步骤”来展示,那将是一种欺诈。)
But each of those steps (conjectures, partial results, subgoals, blind alleys, and so on) is—from a formal point of view—just another token string. As such, they could easily be intermediate states in an interpreted automatic formal system that took a statement of the problem as input and gave a statement of the solution as output. Should these intermediate strings themselves then be interpreted as steps in thinking or reasoning the problem through? If two conditions are met, then the case becomes quite compelling. First, the system had better be able to handle with comparable facility an open-ended and varied range of problems, not just a few (the solutions to which might have been “precanned”). And, it had better be arriving at its solutions actually via these steps. (It would be a kind of fraud if it were really solving the problem in some other way, and then tacking on the “steps” for show afterwards.)
GOFAI 的前提是,可以通过这种方式推理或思考来解决问题,而且,这也是人们解决问题的方式。当然,我们并不总是有意识地意识到这种推理,尤其是对于无数的日常问题——比如那些涉及交谈、做家务和一般相处的问题——我们一直在“解决”。但我们没有意识到这一点并不意味着它没有在潜意识中或以某种方式“在幕后”进行。
GOFAI is predicated on the idea that systems can be built to solve problems by reasoning or thinking them through in this way, and, moreover, that this is how people solve problems. Of course, we aren’t always consciously aware of such reasoning, especially for the countless routine problems—like those involved in talking, doing chores, and generally getting along—that we “solve” all the time. But the fact that we are not aware of it doesn’t mean that it’s not going on, subconsciously or somehow “behind the scenes”.
最早的 GOFAI 工作强调解决问题的方法,尤其是针对各种特定类别的问题设计有效的启发式和搜索程序。(Newell 和 Simon 的文章回顾了这种方法。)然而,这些早期系统往往相当“狭隘”,并且非常容易受到问题和信息中意外变化和异常的影响。虽然它们可以为精心提出的复杂问题提供相当巧妙的解决方案,但它们明显缺乏“常识”——它们无知得无可救药——因此它们很容易犯下普通人永远不会犯的有趣错误。
The earliest GOFAI efforts emphasized problem-solving methods, especially the design of efficient heuristics and search procedures, for various specific classes of problems. (The article by Newell and Simon reviews this approach.) These early systems, however, tended to be quite “narrow-minded” and embarrassingly vulnerable to unexpected variations and oddities in the problems and information they were given. Though they could generate quite clever solutions to complicated problems that were carefully posed, they conspicuously lacked “common sense”—they were hopelessly ignorant—so they were prone to amusing blunders that no ordinary person would ever make.
因此,后来的设计强调了广泛的常识性知识。当然,解决问题的启发式方法和搜索技术仍然必不可少;但是,作为研究问题,它们被大规模“知识表示”的困难所掩盖。最大的问题原来是组织。常识性知识是巨大的;而且,似乎几乎任何一点都可能正是避免在任何特定时刻犯下某些愚蠢错误所需要的。因此,所有这些知识都必须始终处于系统的“认知指尖”上。由于对整个知识库进行反复详尽的搜索是不切实际的,因此必须设计一些在大多数情况下都有效的捷径。这就是知识的有效组织或结构化应该提供的内容。
Later designs have therefore emphasized broad, common-sense knowledge. Of course, problem-solving heuristics and search techniques are still essential; but, as research problems, these were overshadowed by the difficulties of large-scale “knowledge representation”. The biggest problem turned out to be organization. Common-sense knowledge is vast; and, it seems, almost any odd bit of it can be just what is needed to avoid some dumb mistake at any particular moment. So all of it has to be at the system’s “cognitive fingertips” all the time. Since repeated exhaustive search of the entire knowledge base would be quite impractical, some shortcuts had to be devised that would work most of the time. This is what efficient organizing or structuring of the knowledge is supposed to provide.
与启发式问题解决相比,知识表示研究往往集中于自然语言能力,因为这是它所解决的困难最明显的地方。从设计师的角度来看,普通对话的主要挑战是观点认为,它经常是模棱两可和不完整的——主要是因为说话者把很多事情视为理所当然。这意味着系统必须能够填补各种“琐碎”的空白,才能理解所说的内容。但这仍然是 GOFAI,因为填补是理性的。在幕后,系统根据它对世界和上下文的了解,明确地“弄清楚”说话者的意思。(明斯基和德雷福斯的文章调查了其中的一些工作,德雷福斯和塞尔也对此提出了批评。)
Knowledge-representation research, in contrast to heuristic problem solving, has tended to concentrate on natural language ability, since this is where the difficulties it addresses are most obvious. The principal challenge of ordinary conversation, from a designer’s point of view, is that it is so often ambiguous and incomplete—mainly because speakers take so much for granted. That means that the system must be able to fill in all sorts of “trivial” gaps, in order to follow what’s being said. But this is still GOFAI, because the filling in is being done rationally. Behind the scenes, the system is explicitly “figuring out” what the speaker must have meant, on the basis of what it knows about the world and the context. (The articles by Minsky and Dreyfus survey some of this work, and Dreyfus and Searle also criticize it.)
尽管 GOFAI 最初看似合理且前景光明,但在某些方面却令人失望。扩展和组织系统的显性知识存储似乎最多只能部分解决常识问题。这就是图灵测试不会很快通过的原因。此外,设计能够根据经验调整自身知识的系统非常困难。问题不在于它们无法自我修改,而在于很难弄清楚在保持其他一切连贯的同时,到底要进行哪些修改。最后,GOFAI 系统往往不太善于发现意外的相似之处或适应意外的特性。事实上,它们在更普遍地识别模式方面表现不佳——例如感知到的面孔、声音或物体类型——更不用说学习识别它们了。
Despite its initial plausibility and promise, however, GOFAI has been in some ways disappointing. Expanding and organizing a system’s store of explicit knowledge seems at best partially to solve the problem of common sense. This is why the Turing test will not soon be passed. Further, it is surprisingly difficult to design systems that can adjust their own knowledge in the light of experience. The problem is not that they can’t modify themselves, but that it’s hard to figure out just which modifications to make, while keeping everything else coherent. Finally, GOFAI systems tend to be rather poor at noticing unexpected similarities or adapting to unexpected peculiarities. Indeed, they are poor at recognizing patterns more generally—such as perceived faces, sounds, or kinds of objects—let alone learning to recognize them.
当然,这并不意味着该计划失败了。罗马不是一天建成的。目前有大量活跃的研究,新的发展也层出不穷。然而,这意味着一些认知科学家已经开始探索各种替代方法。
None of this means, of course, that the program is bankrupt. Rome was not built in a day. There is a great deal of active research, and new developments occur all the time. It has meant, however, that some cognitive scientists have begun to explore various alternative approaches.
迄今为止,这些新奇想法中最突出的——我们可以统称为NFAI ( en -fai)——属于联结主义的一般范畴。这是一个多样化且仍在快速发展的系统和提案,从表面上看,它们似乎解决了 GOFAI 的一些最明显的弱点。另一方面,联结主义系统并不那么好——至少现在还不是——能够匹敌 GOFAl 最明显的优势。(这当然表明了联合力量的可能性;但是,现在判断这样的事情是否可行还为时过早,更不用说如何做到了。)与此同时,还有其他 NFAI 想法在酝酿,既不是 GOFAI 也不是联结主义。整个领域现在比 50 年代初期更加活跃。
By far the most prominent of these new-fangled ideas—we could call them collectively NFAI (en-fai)—falls under the general rubric of connectionism. This is a diverse and still rapidly evolving bundle of systems and proposals that seem, on the face of it, to address some of GOFAI’s most glaring weaknesses. On the other hand, connectionist systems are not so good—at least not yet—at matching GOFAl’s most obvious strengths. (This suggests, of course, a possibility of joining forces; but, at this point, it’s too soon to tell whether any such thing could work, never mind how it might be done.) And, in the meantime, there are other NFAI ideas afloat, that are neither GOFAI nor connectionist. The field as a whole is in more ferment now than it has been since the earliest days, in the fifties.
联结系统是由许多简单的活动单元组成的网络,这些单元之间有许多连接,它们可以相互作用。没有中央处理器或控制器,也没有单独的内存或存储机制。系统中唯一的活动是这些小单元改变状态,以响应沿这些连接传入的信号,然后发出自己的信号。这种网络可以通过两种方式实现某种记忆。首先,在短期内,只要单元的状态改变速度很慢(并且可能很有规律),信息就可以随着时间的推移保留在系统中。其次,从长期来看,连接本身就是一种记忆。因为每个连接总是连接相同的两个单元(它们不会移动);更重要的是,每个连接都有一个称为“权重”或“强度”的属性,该属性会随着时间的推移而保留。
Connectionist systems are networks of lots of simple active units that have lots of connections among them, by which they can interact. There is no central processor or controller, and also no separate memory or storage mechanism. The only activity in the system is these little units changing state, in response to signals coming in along those connections, and then sending out signals of their own. There are two ways in which such a network can achieve a kind of memory. First, in the short term, information can be retained in the system over time insofar as the units tend to change state only slowly (and, perhaps, regularly). Second, and in the longer term, there is a kind of memory in the connections themselves. For, each connection always connects the same two units (they don’t move around); and, more significant, each connection has a property, called its “weight” or “strength”, which is preserved over time.
显然,联结主义网络在某种程度上受到了大脑和神经网络的启发。活跃单元就像单个神经元,它们之间的连接就像轴突和树突,电化学“脉冲”沿着它们从一个神经元发送到另一个神经元。但是,虽然这种类比很重要,但不应过分强调。作为一种人工智能方法,联结主义系统之所以有趣,并不是因为它们的结构在某种描述层面上模仿了生物学,而是它们能做什么。毕竟,联结主义网络在无数其他描述层面上完全不符合生物学;而且,如果一些 GOFAI 关于人类智能的解释是正确的,那么在某种描述层面上,它也会非常准确地模拟大脑。联结主义和相关研究可能有一天会表明,神经网络是大脑实现心理结构的层面;但这当然不能从一开始就假设。
Obviously, connectionist networks are inspired to some extent by brains and neural networks. The active units are like individual neurons, and the connections among them are like the axons and dendrites along which electro-chemical “pulses” are sent from neuron to neuron. But, while this analogy is important, it should not be overstressed. What makes connectionist systems interesting as an approach to AI is not the fact that their structure mimics biology at a certain level of description, but rather what they can do. After all, there are countless other levels of description at which connectionist nets are utterly unbiological; and, if some GOFAI account turns out to be right about human intelligence, then there will be some level of description at which it too accurately models the brain. Connectionist and allied research may someday show that neural networks are the level at which the brain implements psychological structures; but this certainly cannot be assumed at the outset.
为了理解网络模型的独特之处,重要的是要记住活动单元是多么简单且相对孤立。这种单元的“状态”通常只是一个定量值——可用一个数字指定——称为其激活水平。此激活水平会响应来自其他单元的信号而发生变化,但方式非常粗略。首先,它不关注哪些信号来自哪些其他单元,或者这些信号与其他信号有何关联:它只是将它们不加区分地加在一起,并且只对总数做出反应。此外,该响应,即激活的变化,是总数的简单函数;它随后发送给其他单元的信号只是所产生的激活的简单函数。
In order to appreciate what is distinctive about network models, it is important to keep in mind how simple and relatively isolated the active units are. The “state” of such a unit is typically just a single quantitative magnitude—specifiable with a single number—called its activation level. This activation level changes in response to signals arriving from other units, but only in a very crude way. In the first place, it pays no attention to which signals came from which other units, or how any of those signals might be related to others: it simply adds them indiscriminately together and responds only to the total. Moreover, that response, the change in activation, is a simple function of that total; and the signal it then sends to other units is just a simple function of that resulting activation.
现在有一个小问题,这也是这些模型所有有趣之处的根源。一个单元从另一个单元接收到的信号与另一个单元发送的信号不同:它会乘以(增加或减少)它们之间连接的权重或强度。网络中的连接总是比单元多得多,这仅仅是因为每个单元都与许多其他单元相连。这意味着网络的整体状态(即所有单元的激活模式)可以根据其初始状态以非常微妙和复杂的方式发生变化。连接权重的整体模式决定了这些复杂的变化,从而决定了网络的基本特征。
Now there is one small complication, which is the root of everything interesting about these models. The signal that a unit receives from another is not the same as the signal that the other unit sent: it is multiplied—increased or decreased—by the weight or strength of the connection between them. And there are always many more connections in a network than there are units, simply because each unit is connected to many others. That means that the overall state of the network—that is, the pattern of activations of all its units—can change in very subtle and sophisticated ways, as a function of its initial state. The overall pattern of connection weights is what determines these complicated changes, and thus the basic character of the network.
因此,联结网络本质上是模式处理器。而且,事实证明,它们在某些具有心理学意义的模式处理方面非常擅长。具体来说,它们擅长在模式中发现各种相似之处,识别重复(或几乎重复)的模式,填补不完整模式的缺失部分,并将模式转换为与其相关的其他模式。人类也擅长这些类型的模式处理;但 GOFAI 系统往往不擅长,除非在特殊情况下。不用说,这就是认知科学家对联结主义模型感到兴奋的原因。
Accordingly, connectionist networks are essentially pattern processors. And, it turns out, they can be quite good at certain psychologically important kinds of pattern processing. In particular, they are adept at finding various sorts of similarities among patterns, at recognizing repeated (or almost repeated) patterns, at filling in the missing parts of incomplete patterns, and at transforming patterns into others with which they have been associated. People are good at these kinds of pattern processing too; but GOFAI systems tend not to be, except in special cases. Needless to say, this is what gets cognitive scientists excited about connectionist models.
还有两点。首先,当我说网络擅长这种模式处理时,我的意思不仅是它们可以很好地处理,而且它们可以快速处理。这是因为,尽管每个单元都非常简单,但其中有很多单元同时工作——可以说是并行的——因此每个时间增量的累积效应可能非常可观。其次,已经发现了可以通过接触示例来训练网络的技术。也就是说,可以通过向网络提供大量样本实例并允许其缓慢调整自身来诱导(“教授”)某些所需模式处理能力所需的连接权重。(然而,应该补充的是,迄今为止发现的训练技术在心理上并不现实:人们也从例子中学习,但出于各种原因,我们知道不可能完全通过这种方式。)
Two more points. First, when I say that networks are good at such pattern processing, I mean not only that they can do it well, but also that they can do it quickly. This is a consequence of the fact that, although each unit is very simple, there are a great many of them working at once—in parallel, so to speak—so the cumulative effect in each time increment can be quite substantial. Second, techniques have been discovered by means of which networks can be trained through exposure to examples. That is, the connection weights required for some desired pattern-processing ability can be induced (“taught”) by giving the network a number of sample instances, and allowing it slowly to adjust itself. (It should be added, however, that the training techniques so far discovered are not psychologically realistic: people learn from examples too, but, for various reasons, we know it can’t be in quite these ways.)
我刚才提到,除了特殊情况外,GOFAI 系统在模式处理方面并不擅长。然而,在比较心智设计方法时,必须认识到其中一些“特殊情况”非常重要。特别是,GOFAI 系统非常擅长处理(识别、转换、生成)逻辑公式、普通句子和许多推理所特有的句法(语法)模式。更重要的是,联结网络(到目前为止?)并不特别擅长处理这些模式。然而,语言无疑是(人类)智慧的核心表现。任何不能适应语言能力的心智设计方法都不可能足够。
I mentioned a moment ago that GOFAI systems are not so good at pattern processing, except in special cases. In comparing approaches to mind design, however, it is crucial to recognize that some of these “special cases” are extremely important. In particular, GOFAI systems are remarkably good at processing (recognizing, transforming, producing) syntactical (grammatical) patterns of the sort that are characteristic of logical formulae, ordinary sentences, and many inferences. What’s more, connectionist networks are not (so far?) particularly good at processing these patterns. Yet language is surely a central manifestation of (human) intelligence. No approach to mind design that cannot accommodate language ability can possibly be adequate.
联结主义研究人员在工作中使用计算机的频率与 GOFAI 研究人员一样高,但他们使用计算机的方式不同。模式处理网络本身并不是自动形式系统:它们不操纵形式标记,而且本质上也不是数字化的。可以肯定的是,各个单元和连接彼此截然不同;并且,为了方便起见,它们的激活和权重有时仅限于少数几个离散值。但这些更类似于计算机图像处理中图像的“数字化”,而不是棋子、逻辑符号和单词的本质数字化。因此,联结主义思维设计更多地依赖计算机,就像天气服务一样,以数字方式模拟本身不数字化的系统。
Connectionist researchers use computers in their work just as much as GOFAI researchers do; but they use them differently. Pattern-processing networks are not themselves automatic formal systems: they do not manipulate formal tokens, and they are not essentially digital. To be sure, the individual units and connections are sharply distinct from one another; and, for convenience, their activations and weights are sometimes limited to a handful of discrete values. But these are more akin to the “digitization” of images in computer image processing than to the essential digitalness of chess pieces, logical symbols, and words. Thus, connectionist mind design relies on computers more in the way the weather service does, to simulate digitally systems that are not in themselves digital.
然而,已经证明,一些联结网络实际上可以实现符号操纵系统。尽管这些实现往往效率不高,但它们仍然很有趣。一方面,它们可能展示了如何在大脑中实现符号操纵。另一方面,它们可能提供构建并理解真正的混合系统——即兼具两种方法优点的系统。然而,撇开这些可能性不谈,象征性的实现充其量也只是一场惨胜:网络将被降为“硬件”的角色,而心理相关性,即实际的思维设计,仍将是 GOFAI。
It has been shown, however, that some connectionist networks can, in effect, implement symbol manipulation systems. Although these implementations tend not to be very efficient, they are nevertheless interesting. For one thing, they may show how symbol manipulation could be implemented in the brain. For another, they might yield ways to build and understand genuine hybrid systems—that is, systems with the advantages of both approaches. Such possibilities aside, however, symbolic implementation would seem at best Pyrrhic victory: the network would be relegated to the role of “hardware”, while the psychological relevance, the actual mind design, would still be GOFAI.
GOFAI 的灵感来自于这样一种观点:智能本身是通过明确的思考或推理实现的,即通过对内部符号结构(解释的形式标记)的理性操纵。因此,GOFAI 的意向性以翻译的可能性为基础,即语义解释。相比之下,联结主义 NFAI 最初受到大脑结构的启发,但更深层次地受到非形式模式处理的重要性和普遍性的启发。由于没有形式标记(除非在更高级别实现),因此不可能有语义解释的符号。因此,将这些系统视为具有意向状态就是采用丹尼特的意向立场,即意向解释。
GOFAI is inspired by the idea that intelligence as such is made possible by explicit thinking or reasoning—that is, by the rational manipulation of internal symbol structures (interpreted formal tokens). Thus, GOFAI intentionality is grounded in the possibility of translation—semantic interpretation. Connectionist NFAI, by contrast, is inspired initially by the structure of the brain, but, more deeply, by the importance and ubiquity of non-formal pattern processing. Since there are no formal tokens (unless implemented at a higher level), there can be no semantically interpreted symbols. Thus, to regard these systems as having intentional states would be to adopt Dennett’s intentional stance—that is, intentional interpretation.
GOFAI 是一种相当连贯的研究传统,基于一个基本思想:思考是内部符号操纵。相比之下,“NFAI”更像是一个大杂烩术语:它大致意味着不是 GOFAI 的科学思维设计。联结主义属于这一范畴,但其他几种可能性也属于这一范畴,我只会提到其中一种。
GOFAI is a fairly coherent research tradition, based on a single basic idea: thinking as internal symbol manipulation. ‘NFAI’, by contrast, is more a grab-bag term: it means, roughly, scientific mind design that is not GOFAI. Connectionism falls under this umbrella, but several other possibilities do as well, of which I will mention just one.
尽管联结主义和 GOFAI 系统存在诸多差异,但它们往往有一个共同特点:它们从某处接受输入,对其进行一段时间的处理,然后提供输出。所有的“动作”都在系统内部,而不是作为与活跃的身体和活跃环境的更大交互的组成部分。另一种选择,从根本上说(也许有点争议),是让智能系统成为更大的交互整体,包括身体和环境作为基本组成部分。当然,如果没有一个特殊的“子系统”(例如可能在计算机或大脑中实现的子系统),这个整体就不可能是智能的;但同样,也许那个子系统除非作为由其他组件组成的整体的一部分,否则也不能是智能的。
Connectionist and GOFAI systems, for all their differences, tend to have one feature in common: they accept an input from somewhere, they work on it for a while, and then they deliver an output. All the “action” is within the system, rather than being an integral part of a larger interaction with an active body and an active environment. The alternative, to put it radically (and perhaps a bit contentiously), would be to have the intelligent system be the larger interactive whole, including the body and environment as essential components. Now, of course, this whole couldn’t be intelligent if it weren’t for a special “subsystem” such as might be implemented in a computer or a brain; but, equally, perhaps, that subsystem couldn’t be intelligent either except as part of a whole comprising the other components as well.
为什么有人会这么想?毋庸置疑,一般来说,智能系统应该能够“在”世界中智能地行动。这才是智能的最终目的。然而,在现实机器人中实现哪怕是最基本的能力都出奇地困难。一个简单的例子可以说明这一点,以及激发一些近期研究的观点变化。考虑一个系统,它必须能够接近并打开一扇门。它将如何将钥匙插入锁中?一种方法是为机器人配备:
Why would anyone think this? It goes without saying that, in general, intelligent systems ought to be able to act intelligently “in” the world. That’s what intelligence is for, ultimately. Yet, achieving even basic competence in real robots turns out to be surprisingly hard. A simple example can illustrate the point and also the change in perspective that motivates some recent research. Consider a system that must be able, among other things, to approach and unlock a door. How will it get the key in the lock? One approach would equip the robot with:
(1)精确的传感器来识别和定位锁,并监测自身手臂和手的关节角度;
(1) precise sensors to identify and locate the lock, and monitor the angles of the joints in its own arm and hand;
(2)足够的建模能力,将关节信息转换为钥匙位置和方向的表示(在锁的坐标系中),计算所需的精确钥匙运动,然后将其转换回关节运动;
(2) enough modelling power to convert joint information into a representation of the location and orientation of the key (in the coordinate system of the lock), compute the exact key motion required, and then convert that back into joint motions; and
(3)电机足够精确,能够实现计算出的运动,从而第一次就能将钥匙平稳、笔直地滑入。
(3) motors accurate enough to effect the computed motions, and thereby to slide the key in, smooth and straight, the first time.
值得注意的是,即使使用最先进的技术,这样的系统也是完全不切实际的,甚至根本就不可能实现。然而,昆虫的计算能力却远远不如它们,它们通常能完成更艰巨的任务。
Remarkably, such a system is utterly impractical, perhaps literally impossible, even with state-of-the-art technology. Yet insects, with far less compute power on board, routinely perform much harder tasks.
昆虫“智能”将如何解决钥匙锁问题?首先,该系统将有一个粗糙的探测器,或多或少地用于检测和瞄准锁。但是,它不会生成锁位置的中央表示,供其他子系统用于计算手臂运动。相反,手臂本身会有自己的临时但更本地化的探测器,使其同样或多或少地能够瞄准锁(并且也许还可以在每次尝试之间调整其瞄准)。同时,手臂和它对钥匙的抓握都将非常灵活,锁的开口周围会有一种漏斗,因此任何靠近的刺击都会被物理引导到锁中。这就是工程——优雅、廉价、可靠。
How would insectile “intelligence” approach the key-lock problem? First, the system would have a crude detector to notice and aim at locks, more or less. But, it would generate no central representation of the lock’s position, for other subsystems to use in computing arm movements. Rather, the arm itself would have its own ad hoc, but more local, detectors that enable it likewise to home in on a lock, more or less (and also, perhaps, to adjust its aim from one try to the next). And, in the meantime, the arm and its grip on the key would both be quite flexible, and the lock would have a kind of funnel around its opening, so any stab that’s at all close would be guided physically right into the lock. Now that’s engineering—elegant, cheap, reliable.
但它是智能吗?当然不多;但这可能不是正确的问题。相反,我们应该想知道身体(例如身体灵活性和特殊用途子系统)和世界(漏斗等便利设施)的一些类似的基本参与是否可能对更智能的能力不可或缺。如果是这样,它可以大大减少核心知识、问题解决甚至模式处理的负担,从而(可能)绕过一些阻碍当前设计的瓶颈。
But is it intelligence? Well surely not much; but that may not be the right question to ask. Instead, we should wonder whether some similar essential involvement of the body (physical flexibility and special purpose subsystems, for instance) and the world (conveniences like the funnel) might be integral to capacities that are more plausibly intelligent. If so, it could greatly decrease the load on central knowledge, problem solving, and even pattern processing, thereby circumventing (perhaps) some of the bottlenecks that frustrate current designs.
为了感受一下这些可能性,让我们暂时将目光转向另一个极端。人类的智慧无疑体现在设计和制造物品的能力上——可能使用木板和钉子。现在,要使这种设计奏效,必须能够将钉子钉入木块,使它们固定在一起。但设计师和木匠都不需要考虑这一点——他们甚至不需要想到这一点。(他们认为这是理所当然的,就像鱼儿在水中一样。)这些材料和技术的适用性植根于他们的文化结构中:伐木业、电线制造、木材场的存在——当然,还有代代相传的无数身体技能和习惯。
To get a feel for the possibilities, move for a moment to the other end of the spectrum. Human intelligence is surely manifested in the ability to design and make things—using, as the case may be, boards and nails. Now, for such a design to work, it must be possible to drive nails into pieces of wood in a way that will hold them together. But neither a designer nor a carpenter ever needs to think about that—it need never even occur to them. (They take it for granted, as a fish does water.) The suitability of these materials and techniques is embedded in the structure of their culture: the logging industry, the manufacture of wire, the existence of lumber yards—and, of course, countless bodily skills and habits passed down from generation to generation.
想想看,锤子的传统形状和重量,以及在学习使用锤子时获得的肌肉和反应中包含了多少“知识”——尽管,再说一次,没有人需要想到这一点。再乘以我们的饮食和卫生习惯、我们的衣着方式、建筑物、城市和农场的布局。可以肯定的是,其中一些知识至少曾经被明确地弄清楚过;但很多知识并没有被弄清楚——它只是以这种方式进化(因为它有效)。然而,构成人类智力的大部分(甚至可能是绝大部分)基本专业知识都是在这些“物理”结构中保持和发挥的。它既不储存也不在任何人的头脑中使用——它在他们的身体里,甚至更多地存在于外面的世界中。
Think how much “knowledge” is contained in the traditional shape and heft of a hammer, as well as in the muscles and reflexes acquired in learning to use it—though, again, no one need ever have thought of it. Multiply that by our food and hygiene practices, our manner of dress, the layout of buildings, cities, and farms. To be sure, some of this was explicitly figured out, at least once upon a time; but a lot of it wasn’t—it just evolved that way (because it worked). Yet a great deal, perhaps even the bulk, of the basic expertise that makes human intelligence what it is, is maintained and brought to bear in these “physical” structures. It is neither stored nor used inside the head of anyone—it’s in their bodies and, even more, out there in the world.
对可能以这种方式实现智能的系统(体现和嵌入心智设计)的科学研究仍处于早期阶段。
Scientific research into the kinds of systems that might achieve intelligence in this way—embodied and embedded mind design—is still in an early phase.
人们普遍抱怨人工智能(无论何种类型)很少关注感觉、情感、自我、想象、情绪、意识——即内心生活的整个“现象学”。人们担心,无论机器变得多么智能,家里仍然“无人居住”。我认为这些担忧有相当多的道理,当然,有些担忧比其他担忧更甚。不过,在这里,我想简要讨论一下这种担忧的一种形式,在我看来,这种担忧比其他担忧更基本,也与狭义的认知更密切相关。
A common complaint about artificial intelligence, of whatever stripe, is that it pays scant attention to feelings, emotions, ego, imagination, moods, consciousness—the whole “phenomenology” of an inner life. No matter how smart the machines become, so the worry goes, there’s still “nobody home”. I think there is considerable merit in these misgivings, though, of course, more in some forms than in others. Here, however, I would like briefly to discuss only one form of the worry, one that strikes me as more basic than the others, and also more intimately connected with cognition narrowly conceived.
目前,人工智能领域没有一种方法能够认真对待理解——理解本身被理解为不同于知识(全部或部分)和知识的先决条件。在我看来,从这个意义上讲,只有人类才能理解任何事情——动物和人工制品都无法理解(目前还无法理解)。因此,从严格和适当的意义上讲,没有任何动物或机器真正相信或渴望任何事情——它怎么会相信它不理解的东西?——尽管,显然,在其他一些较弱的意义上,动物(至少)有很多信仰和欲望。我应该补充一点,这种信念并非基于任何原则上的障碍;据我们所知,这只是对当前情况的经验观察。
No current approach to artificial intelligence takes understanding seriously—where understanding itself is understood as distinct from knowledge (in whole or in part) and prerequisite thereto. It seems to me that, taken in this sense, only people ever understand anything—no animals and no artifacts (yet). It follows that, in a strict and proper sense, no animal or machine genuinely believes or desires anything either—How could it believe something it doesn’t understand?—though, obviously, in some other, weaker sense, animals (at least) have plenty of beliefs and desires. This conviction, I should add, is not based on any in-principle barrier; it’s just an empirical observation about what happens to be the case at the moment, so far as we can tell.
那么,系统理解某事物的意义何在?想象一下,一个系统在处理一系列对象的过程中做出或标记一系列相关区别。这些区别可以以不同的技巧反应、不同的符号结构或其他形式出现。假设,对于每个这样的区别,系统都有一个原始概念。现在我认为,系统理解其应用原始概念的对象,只要:
So, what is it for a system to understand something? Imagine a system that makes or marks a battery of related distinctions in the course of coping with some range of objects. These distinctions can show up in the form of differing skillful responses, different symbol structures, or whatever. Let’s say that, for each such distinction, the system has a proto-concept. Now I suggest that a system understands the objects to which it applies its proto-concepts insofar as:
(1)负责正确运用原始概念;
(1) it takes responsibility for applying the proto-concepts correctly;
(2)它对原始概念本身的经验充分性负责;并且
(2) it takes responsibility for the empirical adequacy of the proto-concepts themselves; and
(3)当从这些原始概念的角度去理解时,它对世界上可能发生和不可能发生的事情有着坚定的立场。
(3) it takes a firm stand on what can and cannot happen in the world, when grasped in terms of these proto-concepts.
而且,当满足这些条件时,原始概念就不仅仅是原始概念,而且是完整且真正意义上的概念。
When these conditions are met, moreover, the proto-concepts are not merely proto-concepts, but concepts in the full and proper sense.
这三个条件并非毫无关联。因为,正是在似乎不可能发生的事情面前,正确应用的问题才变得紧迫。我们可以想象系统以某种方式做出反应,我们会说:“这不可能!”然后试图找出哪里出了问题。当概念本身经常找不到任何错误时,它就需要承担责任。在这种情况下,概念结构本身必须进行修改,要么修改体现概念的辨别能力,要么修改它对什么是可能的和什么是不可能的立场,或者两者兼而有之。之后,它将拥有(或多或少)新的概念。
The three conditions are not unrelated. For, it is precisely in the face of something impossible seeming to have happened, that the question of correct application becomes urgent. We can imagine the system responding in some way that we would express by saying: “This can’t be right!” and then trying to figure out what went wrong. The responsibility for the concepts themselves emerges when, too often, it can’t find any mistake. In that event, the conceptual structure itself must be revised, either by modifying the discriminative abilities that embody the concepts, or by modifying the stand it takes on what is and isn’t possible, or both. Afterward, it will have (more or less) new concepts.
以这种方式占有并掌控其自身概念资源的系统不仅仅是在经历智能的动作,无论是进化、学习还是编程,而是为自己掌握了它们的要点。它不仅仅是做出区分或产生输出,而这些输出在我们进行最佳解释时就会变成现实。相反,这样的系统自己能够理解真理与谬误之间的区别,理解在这些方面它必须顺从世界,理解世界决定哪个是哪个——并且它关心。我认为这就是理解。4
A system that appropriates and takes charge of its own conceptual resources in this way is not merely going through the motions of intelligence, whether evolved, learned, or programmed-in, but rather grasps the point of them for itself. It does not merely make discriminations or produce outputs that, when best interpreted by us, come out true. Rather, such a system appreciates for itself the difference between truth and falsity, appreciates that, in these, it must accede to the world, that the world determines which is which—and it cares. That, I think, is understanding.4
1.这一思想的两个部分都源于 WVO Quine 于 1960 年对意义的开创性研究。(意义是意向性的语言或符号对应物。)
1. Both parts of this idea have their roots in W.V.O. Quine’s pioneering (1960) investigations of meaning. (Meaning is the linguistic or symbolic counterpart of intentionality.)
2.国际象棋选手会知道,王车易位、僵局和偷吃的规则也取决于先前的事件;因此,为了使国际象棋严格形式化,这些条件必须被编码到进一步的标记(比如说,标志)中,作为当前位置的一部分。
2. Chess players will know that the rules for castling, stalemate, and capturing en passent depend also on previous events; so, to make chess strictly formal, these conditions would have to be encoded in further tokens (markers, say) that count as part of the current position.
3.密码破解(基本上就是翻译经过精心设计以使其特别困难的文本)也存在类似的观点。只有当解码后的消息始终合理、相关且真实时,密码学家才知道她成功了。
3. A similar point can be made about code-cracking (which is basically translating texts that are contrived to make that especially difficult). A cryptographer knows she has succeeded when and only when the decoded messages come out consistently sensible, relevant, and true.
4. Haugeland (1997a) 的最后四章进一步探讨了这些想法。
4. These ideas are explored further in the last four chapters of Haugeland (1997a).
艾伦·纽厄尔和赫伯特·A·西蒙
Allen Newell and Herbert A. Simon
1976
1976
计算机科学是研究计算机周围现象的学科。该学会的创始人非常了解这一点,他们称自己为计算机协会。机器——不仅是硬件,而且是经过编程的活机器——是我们研究的有机体。
Computer science is the study of the phenomena surrounding computers. The founders of this society understood this very well when they called themselves the Association for Computing Machinery. The machine—not just the hardware, but the programmed living machine—is the organism we study.
这是第十次图灵演讲。在我们之前登上这个讲台的九位演讲者提出了九种不同的计算机科学观点。因为我们的有机体,即机器,可以在多个层面和多个方面进行研究。我们非常荣幸今天来到这里,提出另一种观点,这种观点已经渗透到了我们被引用的科学工作中。我们希望将计算机科学视为一种实证研究。
This is the tenth Turing Lecture. The nine persons who preceded us on this platform have presented nine different views of computer science. For our organism, the machine, can be studied at many levels and from many sides. We are deeply honored to appear here today and to present yet another view, the one that has permeated the scientific work for which we have been cited. We wish to speak of computer science as an empirical inquiry.
我们的观点只是众多观点之一;之前的演讲已经明确了这一点。然而,即使把这些演讲放在一起,也未能涵盖我们科学的整个范围。这十个奖项并没有体现出科学的许多基本方面。如果有一天,当然不会很快,当指南针被框起来,当计算机科学被从各个方面讨论完时,就到了重新开始这个循环的时候了。因为作为讲师的兔子必须每年冲刺一次,才能赶上科学和技术发展的乌龟在稳步前进中取得的微小、渐进的进步。每年都会产生新的差距,需要新的冲刺,因为在科学中没有最终决定权。
Our view is only one of many; the previous lectures make that clear. However, even taken together the lectures fail to cover the whole scope of our science. Many fundamental aspects of it have not been represented in these ten awards. And if the time ever arrives, surely not soon, when the compass has been boxed, when computer science has been discussed from every side, it will be time to start the cycle again. For the hare as lecturer will have to make an annual sprint to overtake the cumulation of small, incremental gains that the tortoise of scientific and technical development has achieved in his steady march. Each year will create a new gap and call for a new sprint, for in science there is no final word.
计算机科学是一门经验学科。我们本来可以称它为实验科学,但与天文学、经济学和地质学一样,它的一些独特的观察和经验形式并不符合狭隘的实验方法刻板印象。尽管如此,它们仍然是实验。每台新机器的制造都是一个实验。实际上,制造机器会向大自然提出一个问题;我们通过观察机器的运行情况并使用所有可用的分析和测量手段对其进行分析来寻找答案。每台新程序的制造都是一个实验。它向大自然提出一个问题,其行为为新的答案提供了线索。机器和程序都不是黑匣子;它们是经过设计的人工制品,包括硬件和软件,我们可以打开它们并查看内部。我们可以将它们的结构与行为联系起来,并从单个实验中吸取许多教训。我们不必复制 100 份定理证明器,从统计上证明它未能以期望的方式克服搜索的组合爆炸。通过几次运行检查程序,发现了缺陷,让我们继续进行下一次尝试。
Computer science is an empirical discipline. We would have called it an experimental science, but like astronomy, economics, and geology, some of its unique forms of observation and experience do not fit a narrow stereotype of the experimental method. Nonetheless, they are experiments. Each new machine that is built is an experiment. Actually constructing the machine poses a question to nature; and we listen for the answer by observing the machine in operation and analyzing it by all analytical and measurement means available. Each new program that is built is an experiment. It poses a question to nature, and its behavior offers clues to a new answer. Neither machines nor programs are black boxes; they are artifacts that have been designed, both hardware and software, and we can open them up and look inside. We can relate their structure to their behavior and draw many lessons from a single experiment. We don’t have to build 100 copies of, say, a theorem prover, to demonstrate statistically that it has not overcome the combinatorial explosion of search in the way hoped for. Inspection of the program in the light of a few runs reveals the flaw and lets us proceed to the next attempt.
我们出于多种原因开发计算机和程序。我们开发它们是为了服务社会,并将它们作为执行社会经济任务的工具。但作为基础科学家,我们开发机器和程序是为了发现新现象和分析我们已经知道的现象。社会常常对此感到困惑,认为计算机和程序的开发只是为了经济用途(或作为导致这种用途的发展序列中的中间项目)。它需要明白,围绕计算机的现象是深奥而晦涩的,需要大量的实验来评估它们的性质。它需要明白,与任何科学一样,从这种实验和理解中获得的收益将带来新技术的永久获得;正是这些技术将创造帮助社会实现其目标的工具。
We build computers and programs for many reasons. We build them to serve society and as tools for carrying out the economic tasks of society. But as basic scientists we build machines and programs as a way of discovering new phenomena and analyzing phenomena we already know about. Society often becomes confused about this, believing that computers and programs are to be constructed only for the economic use that can be made of them (or as intermediate items in a developmental sequence leading to such use). It needs to understand that the phenomena surrounding computers are deep and obscure, requiring much experimentation to assess their nature. It needs to understand that, as in any science, the gains that accrue from such experimentation and understanding pay off in the permanent acquisition of new techniques; and that it is these techniques that will create the instruments to help society in achieving its goals.
然而,我们的目的并不是恳求外界的理解。我们的目的是考察我们科学的一个方面,即通过实证研究发展新的基本理解。最好的办法是举例说明。如果我们有机会,从我们自己的研究领域中选择例子,那将不胜感激。显而易见,这些例子涉及人工智能的整个发展,特别是在其早期。它们所依赖的不仅仅是我们个人的贡献。即使我们做出了直接贡献,也是与他人合作完成的。我们的合作者包括克利夫·肖 (Cliff Shaw),在五十年代末的激动人心的时期,我们与他组成了一个三人团队。但我们也与卡内基梅隆大学的许多同事和学生合作过。
Our purpose here, however, is not to plead for understanding from an outside world. It is to examine one aspect of our science, the development of new basic understanding by empirical inquiry. This is best done by illustrations. We will be pardoned if, presuming upon the occasion, we choose our examples from the area of our own research. As will become apparent, these examples involve the whole development of artificial intelligence, especially in its early years. They rest on much more than our own personal contributions. And even where we have made direct contributions, this has been done in cooperation with others. Our collaborators have included especially Cliff Shaw, with whom we formed a team of three through the exciting period of the late fifties. But we have also worked with a great many colleagues and students at Carnegie Mellon University.
时间允许,我们只举两个例子。第一个是符号系统概念的发展。第二个是启发式搜索概念的发展。这两个概念对于理解信息的处理方式和智能的实现方式都具有深远的意义。然而,它们还远远没有穷尽人工智能的全部范围,尽管在我们看来,它们对于展示计算机科学这一部分基础知识的性质很有用。
Time permits taking up just two examples. The first is the development of the notion of a symbolic system. The second is the development of the notion of heuristic search. Both conceptions have deep significance for understanding how information is processed and how intelligence is achieved. However, they do not come close to exhausting the full scope of artificial intelligence, though they seem to us to be useful for exhibiting the nature of fundamental knowledge in this part of computer science.
对计算机科学知识的根本贡献之一是从相当基础的层面解释了符号是什么。这个解释是关于自然的科学命题。它是经验得出的,经过了长期而渐进的发展。
One of the fundamental contributions to knowledge of computer science has been to explain, at a rather basic level, what symbols are. This explanation is a scientific proposition about nature. It is empirically derived, with a long and gradual development.
符号是智能行为的根源,而智能行为当然是人工智能的主要主题。事实上,这是整个计算机科学的主要问题。因为所有信息都是由计算机处理以服务于目的的,我们衡量系统智能是指系统在面对任务环境的变化、困难和复杂性时,仍能实现既定目标的能力。当所完成的任务范围有限时,计算机科学在实现智能方面的这种一般投入就会被掩盖,因为那时可以准确预见环境的全部变化。当我们将计算机扩展到更全球化、更复杂和知识密集型的任务时,这一点变得更加明显——当我们试图让它们成为我们的代理,能够自行处理自然界的所有突发事件时。
Symbols lie at the root of intelligent action, which is, of course, the primary topic of artificial intelligence. For that matter, it is a primary question for all of computer science. For all information is processed by computers in the service of ends, and we measure the intelligence of a system by its ability to achieve stated ends in the face of variations, difficulties, and complexities posed by the task environment. This general investment of computer science in attaining intelligence is obscured when the tasks being accomplished are limited in scope, for then the full variations in the environment can be accurately foreseen. It becomes more obvious as we extend computers to more global, complex, and knowledge-intensive tasks—as we attempt to make them our agents, capable of handling on their own the full contingencies of the natural world.
我们对系统智能行为要求的理解是缓慢形成的。它是复合的,因为没有单一的基本事物能够解释智能的所有表现形式。没有“智能原则”,就像没有“生命原则”能够从本质上传达生命的本质一样。但是,缺乏简单的“天降神兵”并不意味着智能没有结构要求。其中一项要求是存储和操作符号的能力。为了提出这个科学问题,我们可以套用沃伦·麦卡洛克 (1961) 的一篇著名论文的标题:什么是符号,智能可以使用它,什么是智能,可以使用符号?
Our understanding of the system’s requirements for intelligent action emerges slowly. It is composite, for no single elementary thing accounts for intelligence in all its manifestations. There is no “intelligence principle”, just as there is no “vital principle” that conveys by its very nature the essence of life. But the lack of a simple deus ex machina does not imply that there are no structural requirements for intelligence. One such requirement is the ability to store and manipulate symbols. To put the scientific question, we may paraphrase the title of a famous paper by Warren McCulloch (1961): What is a symbol, that intelligence may use it, and intelligence, that it may use a symbol?
所有科学都会对其所研究系统的本质进行描述。这些描述本质上都是定性的,因为它们设定了发展更详细知识的条件。它们的本质通常可以用非常简短、非常笼统的陈述来表达。人们可能会认为这些一般规律对科学的总体贡献相对较小,因为它们的特殊性有限,如果不是因为历史证据表明它们是最重要的结果。
All sciences characterize the essential nature of the systems they study. These characterizations are invariably qualitative in nature, for they set the terms within which more detailed knowledge can be developed. Their essence can often be captured in very short, very general statements. One might judge these general laws, because of their limited specificity, as making relatively little contribution to the sum of a science, were it not for the historical evidence that shows them to be results of the greatest importance.
生物学中的细胞学说。定性结构定律的一个很好的例子就是生物学中的细胞学说,它指出所有生物体的基本组成部分都是细胞。细胞有各种各样的形式,但它们都有一个被原生质包围的细胞核,整个细胞被膜包裹。但从历史上看,这种内部结构并不是细胞学说规范的一部分;它是后来通过深入研究发展起来的特殊性。细胞学说几乎可以完全通过我们上面给出的陈述来传达,以及一些关于细胞大小的模糊概念。然而,这条定律对生物学的影响是巨大的,在逐渐被接受之前,该领域已经失去了相当大的动力。
THE CELL DOCTRINE IN BIOLOGY. A good example of a law of qualitative structure is the cell doctrine in biology, which states that the basic building block of all living organisms is the cell. Cells come in a large variety of forms, though they all have a nucleus surrounded by protoplasm, the whole encased by a membrane. But this internal structure was not, historically, part of the specification of the cell doctrine; it was subsequent specificity developed by intensive investigation. The cell doctrine can be conveyed almost entirely by the statement we gave above, along with some vague notions about what size a cell can be. The impact of this law on biology, however, has been tremendous, and the lost motion in the field prior to its gradual acceptance was considerable.
地质学中的板块构造。地质学提供了一个定性结构定律的有趣例子,之所以有趣,是因为它在过去十年中得到了认可,因此其地位的上升仍然记忆犹新。板块构造理论认为,地球表面是一群巨大的板块——总共有几十个——它们(以地质速度)相互对抗、相互叠加、相互下沉,直到地球中心,在那里它们失去了自己的身份。板块的运动决定了大陆和海洋的形状和相对位置,决定了火山和地震活动的地区,决定了深海山脊等等。加上速度和大小等一些额外的细节,基本理论已经明确。当然,它直到成功解释了许多细节后才被接受,而这些细节都是相互联系的(例如,解释了西非和南美洲东北部之间的植物群、动物群和分层一致性)。板块构造理论是高度定性的。现在它被接受了,整个地球似乎到处都为它提供了证据,因为我们用它来看世界。
PLATE TECTONICS IN GEOLOGY. Geology provides an interesting example of a qualitative structure law, interesting because it has gained acceptance in the last decade and so its rise in status is still fresh in our memory. The theory of plate tectonics asserts that the surface of the globe is a collection of huge plates—a few dozen in all—which move (at geological speeds) against, over, and under each other into the center of the earth, where they lose their identity. The movements of the plates account for the shapes and relative locations of the continents and oceans, for the areas of volcanic and earthquake activity, for the deep sea ridges, and so on. With a few additional particulars as to speed and size, the essential theory has been specified. It was of course not accepted until it succeeded in explaining a number of details, all of which hung together (for instance, accounting for flora, fauna, and stratification agreements between West Africa and Northeast South America). The plate-tectonics theory is highly qualitative. Now that it is accepted, the whole earth seems to offer evidence for it everywhere, for we see the world in its terms.
细菌致病理论。距巴斯德提出细菌致病理论仅过去一个世纪,这一定性结构定律引发了医学的一场革命。该理论认为,大多数疾病是由体内微小单细胞生物的存在和繁殖引起的,而传染则在于这些生物从一个宿主传播到另一个宿主。该理论的制定主要包括识别与特定疾病相关的生物、对它们进行描述以及追踪它们的生活史。尽管该定律有许多例外(许多疾病不是由细菌引起的),但这并不减损其重要性。该定律告诉我们要寻找某一特定类型的原因,但它并不坚持认为我们总能找到它。
THE GERM THEORY OF DISEASE. It is little more than a century since Pasteur enunciated the germ theory of disease, a law of qualitative structure that produced a revolution in medicine. The theory proposes that most diseases are caused by the presence and multiplication in the body of tiny single-celled living organisms, and that contagion consists in the transmission of these organisms from one host to another. A large part of the elaboration of the theory consisted in identifying the organisms associated with specific diseases, describing them, and tracing their life histories. The fact that this law has many exceptions—that many diseases are not produced by germs—does not detract from its importance. The law tells us to look for a particular kind of cause; it does not insist that we will always find it.
原子论学说原子论学说与我们刚刚描述的三条定性结构定律形成了有趣的对比。正如道尔顿的著作及其证明化学物质以固定比例结合时所揭示的那样,原子论定律提供了一个定性结构的典型例子:元素由小而均匀的粒子组成,每个元素之间都不同。但是,由于原子的基本种类非常简单,种类有限,因此很快就形成了定量理论,这些理论吸收了原始定性假设中的所有一般结构。对于细胞、构造板块和细菌,结构的多样性是如此之大,以至于基本的定性原理仍然独特,而其对整个理论的贡献却清晰可见。
THE DOCTRINE OF ATOMISM. The doctrine of atomism offers an interesting contrast to the three laws of qualitative structure we have just described. As it emerged from the work of Dalton and his demonstrations that the chemicals combined in fixed proportions, the law provided a typical example of qualitative structure: the elements are composed of small, uniform particles, differing from one element to another. But because the underlying species of atoms are so simple and limited in their variety, quantitative theories were soon formulated which assimilated all the general structure in the original qualitative hypothesis. With cells, tectonic plates, and germs, the variety of structure is so great that the underlying qualitative principle remains distinct, and its contribution to the total theory clearly discernible.
结论。定性结构定律在科学中随处可见。我们一些最伟大的科学发现都存在于其中。正如这些例子所示,它们常常为整个科学的运作设定了条件。
CONCLUSION. Laws of qualitative structure are seen everywhere in science. Some of our greatest scientific discoveries are to be found among them. As the examples illustrate, they often set the terms on which a whole science operates.
让我们回到符号的话题,定义一个物理符号系统。形容词“物理”表示两个重要特征:(1)此类系统显然遵循物理定律——它们可以通过由工程组件构成的工程系统来实现;(2)虽然我们对“符号”一词的使用预示了我们想要的解释,但它并不局限于人类的符号系统。
Let us return to the topic of symbols, and define a physical symbol system. The adjective “physical” denotes two important features: (1) such systems clearly obey the laws of physics—they are realizable by engineered systems made of engineered components; and (2) although our use of the term “symbol” prefigures our intended interpretation, it is not restricted to human symbol systems.
物理符号系统由一组称为符号的实体组成,这些实体是物理模式,可以作为另一种称为表达式(或符号结构)的实体的组成部分出现。因此,符号结构由许多以某种物理方式相关的符号实例(或标记)组成(例如一个标记紧挨着另一个标记)。在任何时刻,系统都会包含这些符号结构的集合。除了这些结构之外,系统还包含一组对表达式进行操作以产生其他表达式的过程:创建、修改、复制和破坏的过程。物理符号系统是一种随着时间的推移产生不断演变的符号结构集合的机器。这样的系统存在于比这些符号表达式本身更广阔的物体世界中。
A physical symbol system consists of a set of entities, called symbols, which are physical patterns that can occur as components of another type of entity called an expression (or symbol structure). Thus a symbol structure is composed of a number of instances (or tokens) of symbols related in some physical way (such as one token being next to another). At any instant of time the system will contain a collection of these symbol structures. Besides these structures, the system also contains a collection of processes that operate on expressions to produce other expressions: processes of creation, modification, reproduction, and destruction. A physical symbol system is a machine that produces through time an evolving collection of symbol structures. Such a system exists in a world of objects wider than just these symbolic expressions themselves.
这种表达、符号和对象的结构有两个核心概念:指定和解释。
Two notions are central to this structure of expressions, symbols, and objects: designation and interpretation.
指定.如果给定表达式,系统可以影响对象本身或者根据对象以某种方式运行,则表达式指定一个对象。
DESIGNATION. An expression designates an object if, given the expression, the system can either affect the object itself or behave in ways depending on the object.
无论哪种情况,都已经可以通过表达式访问对象,这就是指定的本质。
In either case, access to the object via the expression has been obtained, which is the essence of designation.
解释。如果表达式指定了一个过程,并且给定该表达式,系统可以执行该过程,则系统可以解释该表达式。1
INTERPRETATION. The system can interpret an expression if the expression designates a process and if, given the expression, the system can carry out the process.1
解释意味着一种特殊形式的依赖行为:给定一个表达式,系统可以执行指示的过程,也就是说,它可以从指定它们的表达式中唤起并执行它自己的过程。
Interpretation implies a special form of dependent action: given an expression, the system can perform the indicated process, which is to say, it can evoke and execute its own processes from expressions that designate them.
一个能够指定和解释的系统,就刚才所指出的意义而言,还必须满足一些额外的要求,即完整性和封闭性。篇幅有限,我们只能简要提到这些要求;所有这些都很重要,并具有深远的影响。
A system capable of designation and interpretation, in the sense just indicated, must also meet a number of additional requirements, of completeness and closure. We will have space only to mention these briefly; all of them are important and have far-reaching consequences.
(1) 符号可用于表示任何表达式。也就是说,给定一个符号,并不能预先规定它可以表示什么表达式。这种任意性只适用于符号:符号标记及其相互关系决定了复杂表达式所表示的对象。(2) 存在表示机器能够执行的每个过程的表达式。(3) 存在创建任何表达式和以任意方式修改任何表达式的过程。(4) 表达式是稳定的;一旦创建,它们将持续存在,直到明确修改或删除。(5) 系统可以容纳的表达式数量基本上是无限的。
(1) A symbol may be used to designate any expression whatsoever. That is, given a symbol, it is not prescribed a priori what expressions it can designate. This arbitrariness pertains only to symbols: the symbol tokens and their mutual relations determine what object is designated by a complex expression. (2) There exist expressions that designate every process of which the machine is capable. (3) There exist processes for creating any expression and for modifying any expression in arbitrary ways. (4) Expressions are stable; once created, they will continue to exist until explicitly modified or deleted. (5) The number of expressions that the system can hold is essentially unbounded.
我们刚刚定义的系统类型对计算机科学家来说并不陌生。它与所有通用计算机都具有很强的家族相似性。如果将符号操作语言(例如 LISP)视为定义机器,那么这种亲属关系就变得真正亲如兄弟。我们设计这种系统的目的不是提出新的东西。恰恰相反:它是为了展示目前已知和假设的满足这种特征的系统。
The type of system we have just defined is not unfamiliar to computer scientists. It bears a strong family resemblance to all general purpose computers. If a symbol-manipulation language, such as LISP, is taken as defining a machine, then the kinship becomes truly brotherly. Our intent in laying out such a system is not to propose something new. Just the opposite: it is to show what is now known and hypothesized about systems that satisfy such a characterization.
现在我们可以提出一个普遍的科学假设——符号系统定性结构规律:
We can now state a general scientific hypothesis—a law of qualitative structure for symbol systems:
物理符号系统假说物理符号系统具有
一般智能行为的必要和充分手段。
THE PHYSICAL SYMBOL SYSTEM HYPOTHESIS. A physical symbol system has the
necessary and sufficient means for general intelligent action.
“必要”是指任何表现出一般智能的系统在分析后都会被证明是一个物理符号系统。“充分”是指任何足够大的物理符号系统都可以进一步组织起来以表现出一般智能。我们希望通过“一般智能行为”来表明智能的范围与我们在人类行为中看到的智能范围相同:在任何真实情况下,在速度和复杂性的某些限制内,都可以发生适合系统目的并适应环境要求的行为。
By “necessary” we mean that any system that exhibits general intelligence will prove upon analysis to be a physical symbol system. By “sufficient” we mean that any physical symbol system of sufficient size can be organized further to exhibit general intelligence. By “general intelligent action” we wish to indicate the same scope of intelligence as we see in human action: that in any real situation, behavior appropriate to the ends of the system and adaptive to the demands of the environment can occur, within some limits of speed and complexity.
物理符号系统假说显然是定性结构定律。它指定了一类通用系统,人们可以在其中找到具有智能行为能力的系统。
The Physical Symbol System Hypothesis clearly is a law of qualitative structure. It specifies a general class of systems within which one will find those capable of intelligent action.
这是一个经验假设。我们定义了一类系统;我们想问的是,这类系统是否能解释我们在现实世界中发现的一组现象。在生物世界中,我们周围到处都有智能行为,主要是人类行为。我们可以通过它的效果来识别它是否是由人类执行的行为形式。这个假设确实可能是错误的。智能行为并不容易产生,任何系统都不会随意表现出来。事实上,有些人的分析使他们从哲学或科学的角度得出结论,认为这个假设是错误的。从科学的角度来看,人们只有通过提出关于自然世界的经验证据才能攻击或捍卫它。
This is an empirical hypothesis. We have defined a class of systems; we wish to ask whether that class accounts for a set of phenomena we find in the real world. Intelligent action is everywhere around us in the biological world, mostly in human behavior. It is a form of behavior we can recognize by its effects whether it is performed by humans or not. The hypothesis could indeed be false. Intelligent behavior is not so easy to produce that any system will exhibit it willy-nilly. Indeed, there are people whose analyses lead them to conclude, either on philosophical or on scientific grounds, that the hypothesis is false. Scientifically, one can attack or defend it only by bringing forth empirical evidence about the natural world.
我们现在需要追踪这一假设的发展并寻找其证据。
We now need to trace the development of this hypothesis and look at the evidence for it.
物理符号系统是通用机器的一个实例。因此,符号系统假设意味着智能将由通用计算机实现。然而,该假设远远超出了通常基于物理决定论的一般理由提出的论点,即任何可实现的计算都可以由通用机器实现,只要它被指定。因为它明确地断言智能机器是一个符号系统,从而对智能系统的性质做出了具体的架构断言。了解这种额外的特异性是如何产生的很重要。
A physical symbol system is an instance of a universal machine. Thus the symbol system hypothesis implies that intelligence will be realized by a universal computer. However, the hypothesis goes far beyond the argument, often made on general grounds of physical determinism, that any computation that is realizable can be realized by a universal machine, provided that it is specified. For it asserts specifically that the intelligent machine is a symbol system, thus making a specific architectural assertion about the nature of intelligent systems. It is important to understand how this additional specificity arose.
形式逻辑。这一假设的根源可以追溯到弗雷格、怀特海和罗素的形式化逻辑计划:在逻辑中捕捉数学的基本概念概念,并将证明和演绎的概念置于牢固的基础上。这一努力最终导致了数理逻辑——我们熟悉的命题逻辑、一阶逻辑和高阶逻辑。它发展了一种典型的观点,通常被称为“符号游戏”。逻辑,以及通过整合所有数学,是一种根据某些纯粹的句法规则用无意义的标记进行的游戏。所有意义都被清除了。人们有一个机械的、虽然宽容的(我们现在可以说非确定性的)系统,关于各种事物都可以被证明。因此,进步首先通过摆脱一切似乎与意义和人类符号相关的东西而取得。我们可以将此称为形式符号操纵阶段。
FORMAL LOGIC. The roots of the hypothesis go back to the program of Frege and of Whitehead and Russell for formalizing logic: capturing the basic conceptual notions of mathematics in logic and putting the notions of proof and deduction on a secure footing. This effort culminated in mathematical logic—our familiar propositional, first-order, and higher-order logics. It developed a characteristic view, often referred to as the “symbol game”. Logic, and by incorporation all of mathematics, was a game played with meaningless tokens according to certain purely syntactic rules. All meaning had been purged. One had a mechanical, though permissive (we would now say nondeterministic), system about which various things could be proved. Thus progress was first made by walking away from all that seemed relevant to meaning and human symbols. We could call this the stage of formal symbol manipulation.
这种普遍态度在信息论的发展中得到了很好的体现。人们一次又一次地指出,香农定义的系统只适用于通信和选择,与意义无关。人们对此领域被赋予“信息论”这样一个通用名称表示遗憾,并试图将其重新命名为“选择性信息理论”——当然,这是徒劳的。
This general attitude is well reflected in the development of information theory. It was pointed out time and again that Shannon had defined a system that was useful only for communication and selection, and which had nothing to do with meaning. Regrets were expressed that such a general name as “information theory” had been given to the field, and attempts were made to rechristen it as “the theory of selective information”—to no avail, of course.
图灵机和数字计算机。第一台数字计算机和自动机理论的发展,始于 20 世纪 30 年代图灵自己的工作,可以放在一起讨论。它们对什么是本质的看法是一致的。让我们使用图灵自己的模型,因为它很好地展示了这些特征。
TURING MACHINES AND THE DIGITAL COMPUTER. The development of the first digital computers and of automata theory, starting with Turing’s own work in the 1930s, can be treated together. They agree in their view of what is essential. Let us use Turing’s own model, for it shows the features well.
图灵机由两个存储器组成:一个无限磁带和一个有限状态控制器。磁带保存数据,也就是著名的零和一。机器对磁带只有很小的一组正确操作——读取、写入和扫描操作。读取操作不是数据操作,而是根据读取头下的数据提供控制状态的条件分支。众所周知,这个模型包含了所有计算机的基本功能,尽管具有不同内存和操作的其他计算机可能会以不同的空间和时间要求执行相同的计算。特别是,图灵机模型包含了无法计算的内容和通用机器的概念——计算机可以做任何机器能做的事情。
A Turing machine consists of two memories: an unbounded tape and a finite-state control. The tape holds data, that is, the famous zeros and ones. The machine has a very small set of proper operations—read, write, and scan operations—on the tape. The read operation is not a data operation, but provides conditional branching to a control state as a function of the data under the read head. As we all know, this model contains the essentials of all computers, in terms of what they can do, though other computers with different memories and operations might carry out the same computations with different requirements of space and time. In particular, the model of a Turing machine contains within it the notions both of what cannot be computed and of universal machines—computers that can do anything that can be done by any machine.
我们应该惊叹,我们对信息处理的两个最深刻的见解是在 20 世纪 30 年代,在现代计算机出现之前获得的。这是对阿兰·图灵天才的致敬。这也是对当时数理逻辑发展的致敬,并证明了计算机科学对数理逻辑的深厚贡献。与图灵的工作同时出现的是逻辑学家埃米尔·波斯特和(独立)阿隆佐·丘奇的工作。从独立的逻辑系统概念(分别是波斯特产生式和递归函数)出发,他们得出了关于不可判定性和普遍性的类似结果——这些结果很快就表明这三个系统是等价的。事实上,所有这些定义最通用的信息处理系统类别的尝试的融合,在一定程度上增强了我们的信念,即我们已经在这些模型中掌握了信息处理的本质。
We should marvel that two of our deepest insights into information processing were achieved in the thirties, before modern computers came into being. It is a tribute to the genius of Alan Turing. It is also a tribute to the development of mathematical logic at the time, and testimony to the depth of computer science’s obligation to it. Concurrently with Turing’s work appeared the work of the logicians Emil Post and (independently) Alonzo Church. Starting from independent notions of logistic systems (Post productions and recursive functions, respectively), they arrived at analogous results on undecidability and universality—results that were soon shown to imply that all three systems were equivalent. Indeed, the convergence of all these attempts to define the most general class of information-processing systems provides some of the force of our conviction that we have captured the essentials of information processing in these models.
在这些系统中,从表面上看,没有一个系统将符号视为表示的概念。数据被视为只是由 0 和 1 组成的字符串——事实上,数据的惰性对于将计算简化为物理过程至关重要。有限状态控制系统始终被视为一个小型控制器,人们玩逻辑游戏来查看在不破坏通用性的情况下可以使用多小的状态系统机器。据我们所知,从来没有玩过向有限控制动态添加新状态的游戏——认为控制内存保存了系统的大部分知识。在这个阶段完成的是解释原理的一半——表明机器可以从描述中运行。因此,这是自动形式符号操作的阶段。
In none of these systems is there, on the surface, a concept of the symbol as something that designates. The data are regarded as just strings of zeroes and ones—indeed, that data be inert is essential to the reduction of computation to physical process. The finite-state control system was always viewed as a small controller, and logical games were played to see how small a state system could be used without destroying the universality of the machine. No games, as far as we can tell, were ever played to add new states dynamically to the finite control—to think of the control memory as holding the bulk of the system’s knowledge. What was accomplished at this stage was half of the principle of interpretation—showing that a machine could be run from a description. Thus, this is the stage of automatic formal symbol manipulation.
存储程序概念。20世纪 40 年代中期,随着第二代电子机器(Eniac 之后)的发展,存储程序概念应运而生。这被誉为概念和实践上的里程碑。现在,程序可以是数据,也可以作为数据进行操作。当然,这种能力在图灵模型中已经隐含:描述与数据位于同一盘磁带上。然而,只有当机器获得足够的内存,使实际程序能够位于某个内部位置时,这个想法才得以实现。毕竟,Eniac 只有 20 个寄存器。
THE STORED-PROGRAM CONCEPT. With the development of the second generation of electronic machines in the mid-forties (after the Eniac) came the stored-program concept. This was rightfully hailed as a milestone, both conceptually and practically. Programs now can be data, and can be operated on as data. This capability is, of course, already implicit in the model of Turing: the descriptions are on the very same tape as the data. Yet the idea was realized only when machines acquired enough memory to make it practicable to locate actual programs in some internal place. After all, the Eniac had only twenty registers.
存储程序概念体现了解释原则的后半部分,即系统自身的数据可以被解释。但它还不包含指定的概念——意义背后的物理关系。
The stored-program concept embodies the second half of the interpretation principle, the part that says that the system’s own data can be interpreted. But it does not yet contain the notion of designation—of the physical relation that underlies meaning.
列表处理。下一步是列表处理,这发生在 1956 年。数据结构的内容现在是符号,在我们的物理符号系统中:指定模式,具有指称项。列表保存允许访问其他列表的地址——因此有了列表结构的概念。这是一个新观点,在早期我们曾多次看到同事们会问数据在哪里——即哪个列表最终保存了系统内容的位集合。他们发现奇怪的是,没有这样的位,只有指定其他符号结构的符号。
LIST PROCESSING. The next step, taken in 1956, was list processing. The contents of the data structures were now symbols, in the sense of our physical symbol system: patterns that designated, that had referents. Lists held addresses which permitted access to other lists—thus the notion of list structures. That this was a new view was demonstrated to us many times in the early days of list processing when colleagues would ask where the data were—that is, which list finally held the collection of bits that were the content of the system. They found it strange that there were no such bits, there were only symbols that designated yet other symbol structures.
列表处理在计算机科学的发展中同时体现了三件事。(1)它是在机器中创建真正的动态内存结构,而此前人们认为该机器具有固定结构。除了替换和更改内容的操作外,它还在我们的操作集合中添加了构建和修改结构的操作。(2)它是基本抽象概念的早期演示,即计算机由一组数据类型和一组适合这些数据类型的操作组成,因此计算系统应该使用适合应用程序的任何数据类型,而不受底层机器的影响。(3)列表处理产生了一种指定模型,从而定义了符号操作,就像我们今天在计算机科学中使用此概念一样。
List processing is simultaneously three things in the development of computer science. (1) It is the creation of a genuine dynamic memory structure in a machine that had heretofore been perceived as having fixed structure. It added to our ensemble of operations those that built and modified structure in addition to those that replaced and changed content. (2) It was an early demonstration of the basic abstraction that a computer consists of a set of data types and a set of operations proper to these data types, so that a computational system should employ whatever data types are appropriate to the application, independent of the underlying machine. (3) List-processing produced a model of designation, thus defining symbol manipulation in the sense in which we use this concept in computer science today.
正如经常发生的那样,当时的实践已经预见到了列表处理的所有元素:地址显然用于获取访问权限,鼓机使用链接程序(所谓的一加一寻址),等等。但是,将列表处理视为抽象的概念创造了一个新世界,其中指定和动态符号结构是决定性特征。早期列表处理系统嵌入语言(IPL、LISP)经常被谴责为阻碍列表处理技术在整个编程实践中的传播;但它是将抽象结合在一起的载体。
As often occurs, the practice of the time already anticipated all the elements of list processing: addresses are obviously used to gain access, the drum machines used linked programs (so called one-plus-one addressing), and so on. But the conception of list processing as an abstraction created a new world in which designation and dynamic symbolic structure were the defining characteristics. The embedding of the early list-processing systems in languages (the IPLs, LISP) is often decried as having been a barrier to the diffusion of list-processing techniques throughout programming practice; but it was the vehicle that held the abstraction together.
L ISP。还有一个值得注意的步骤:麦卡锡在 1959-60 年创建了 LISP(McCarthy,1960 年)。它完成了抽象操作,将列表结构从其在具体机器中的嵌入中解放出来,创建了一个具有 S 表达式的新形式系统,可以证明它与其他通用计算方案等同。
LISP. One more step is worth noting: McCarthy’s creation of LISP in 1959-60 (McCarthy, 1960). It completed the act of abstraction, lifting list structures out of their embedding in concrete machines, creating a new formal system with S-expressions, which could be shown to be equivalent to the other universal schemes of computation.
结论。指定符号和符号操作的概念直到 50 年代中期才出现,这并不意味着早期的步骤是无关紧要或不那么重要的。总体概念是可计算性、物理可实现性(以及多种技术)、通用性、过程的符号表示(即可解释性)以及最后的符号结构和指定的结合。每个步骤都是整体的重要组成部分。
CONCLUSION. That the concept of a designating symbol and symbol manipulation does not emerge until the mid-fifties does not mean that the earlier steps were either inessential or less important. The total concept is the join of computability, physical realizability (and by multiple technologies), universality, the symbolic representation of processes (that is, interpretability), and, finally, symbolic structure and designation. Each of the steps provided an essential part of the whole.
这条链条的第一步由图灵发起,它具有理论驱动力,但其他步骤都有着深刻的经验根源。计算机本身的发展一直引领着我们。
The first step in this chain, authored by Turing, is theoretically motivated, but the others all have deep empirical roots. We have been led by the evolution of the computer itself.
存储程序原理源自 Eniac 的经验。列表处理源自构建智能程序的尝试。它从随机存取存储器的出现中汲取灵感,随机存取存储器为地址中的指定符号提供了清晰的物理实现。LISP 源自列表处理不断发展的经验。
The stored-program principle arose out of the experience with Eniac. List processing arose out of the attempt to construct intelligent programs. It took its cue from the emergence of random-access memories, which provided a clear physical realization of a designating symbol in the address. LISP arose out of the evolving experience with list processing.
现在我们来看一下物理符号系统能够进行智能操作,并且一般智能操作需要物理符号系统这一假设的证据。该假设是一种经验概括,而不是定理。我们不知道如何从纯逻辑的角度证明符号系统与智能之间的联系。如果没有这样的证明,我们必须看事实。然而,我们的主要目的不是详细回顾证据,而是用我们面前的例子来说明计算机科学是一个经验研究领域的命题。因此,我们只会指出有哪些证据,以及测试过程的一般性质。
We come now to the evidence for the hypothesis that physical symbol systems are capable of intelligent action, and that general intelligent action calls for a physical symbol system. The hypothesis is an empirical generalization and not a theorem. We know of no way of demonstrating the connection between symbol systems and intelligence on purely logical grounds. Lacking such a demonstration, we must look at the facts. Our central aim, however, is not to review the evidence in detail, but to use the example before us to illustrate the proposition that computer science is a field of empirical inquiry. Hence, we will only indicate what kinds of evidence there are, and the general nature of the testing process.
到 20 世纪 50 年代中期,物理符号系统的概念基本形成了现在的形式,从那时开始,人工智能作为计算机科学的一个连贯子领域开始发展。此后的 20 年里,我们不断积累了两种主要类型的经验证据。第一种证据涉及物理符号系统是否足以产生智能,试图构建和测试具有这种能力的特定系统。第二种证据涉及在展示智能的地方必须有物理符号系统。它从人类开始,人类是我们最熟悉的智能系统,并试图发现人类的认知活动是否可以解释为物理符号系统的运作。还有其他形式的证据,我们将对此进行评论稍后会简要介绍,但这两个是最重要的。我们将依次讨论它们。第一个通常被称为人工智能,第二个是认知心理学的研究。
The notion of a physical symbol system had taken essentially its present form by the middle of the 1950’s, and one can date from that time the growth of artificial intelligence as a coherent subfield of computer science. The twenty years of work since then has seen a continuous accumulation of empirical evidence of two main varieties. The first addresses itself to the sufficiency of physical symbol systems for producing intelligence, attempting to construct and test specific systems that have such a capability. The second kind of evidence addresses itself to the necessity of having a physical symbol system wherever intelligence is exhibited. It starts with man, the intelligent system best known to us, and attempts to discover whether his cognitive activity can be explained as the working of a physical symbol system. There are other forms of evidence, which we will comment upon briefly later, but these two are the important ones. We will consider them in turn. The first is generally called artificial intelligence, the second, research in cognitive psychology.
构建智能系统。细菌致病理论的初步测试的基本范式是:先确定疾病,然后寻找细菌。类似的范式启发了人工智能领域的大量研究:确定需要智能的任务领域,然后为数字计算机构建一个可以处理该领域任务的程序。首先研究简单且结构良好的任务:谜题和游戏、调度和分配资源的运筹学问题、简单的归纳任务。到目前为止,已经构建了数十个(如果不是数百个)此类程序,每个程序都能够在适当的领域执行一定程度的智能操作。
CONSTRUCTING INTELLIGENT SYSTEMS. The basic paradigm for the initial testing of the germ theory of disease was: identify a disease, then look for the germ. An analogous paradigm has inspired much of the research in artificial intelligence: identify a task domain calling for intelligence, then construct a program for a digital computer that can handle tasks in that domain. The easy and well-structured tasks were looked at first: puzzles and games, operations-research problems of scheduling and allocating resources, simple induction tasks. Scores, if not hundreds, of programs of these kinds have by now been constructed, each capable of some measure of intelligent action in the appropriate domain.
当然,智能并不是非此即彼的问题,在特定领域中,智能在性能上取得了稳步进展,在扩展这些领域的范围方面也取得了进展。例如,早期的国际象棋程序如果能够合法地、有目的地下棋,就被认为是成功的;不久之后,它们达到了人类初学者的水平;在 10 到 15 年内,它们开始与真正的业余选手竞争。进展缓慢(投入的总体编程工作量很小),但却是持续的,构建和测试的范式按常规周期进行——整个研究活动在宏观层面上模仿了许多人工智能程序的基本生成和测试周期。
Of course intelligence is not an all-or-none matter, and there has been steady progress toward higher levels of performance in specific domains, as well as toward widening the range of those domains. Early chess programs, for example, were deemed successful if they could play a game legally and with some indication of purpose; a little later, they reached the level of human beginners; within ten or fifteen years, they began to compete with serious amateurs. Progress has been slow (and the total programming effort invested small) but continuous, and the paradigm of construct-and-test proceeds in a regular cycle—the whole research activity mimicking at the macroscopic level the basic generate-and-test cycle of many of the AI programs.
实现智能行动的领域正在稳步扩大。对于最初的任务,研究已经扩展到构建以各种方式处理和理解自然语言的系统、解释视觉场景的系统、手眼协调系统、设计系统、编写计算机程序的系统、语音理解系统——这个清单如果不是无穷无尽的,至少也是很长的。如果说存在假设无法超越的极限,那么这些极限尚未显现。到目前为止,进展速度主要取决于已应用的相当有限的科学资源数量,以及每项新重大任务必然需要大量的系统建设工作。
There is a steadily widening area within which intelligent action is attainable. For the original tasks, research has extended to building systems that handle and understand natural language in a variety of ways, systems for interpreting visual scenes, systems for hand-eye coordination, systems that design, systems that write computer programs, systems for speech understanding—the list is, if not endless, at least very long. If there are limits beyond which the hypothesis will not carry us, they have not yet become apparent. Up to the present, the rate of progress has been governed mainly by the rather modest quantity of scientific resources that have been applied and the inevitable requirement of a substantial system-building effort for each new major undertaking.
当然,除了简单地堆积适应特定任务领域的智能系统实例之外,还有很多事情正在发生。如果事实证明执行这些不同任务的人工智能程序除了都是物理符号系统的实例之外没有任何共同之处,那将是令人惊讶和不吸引人的。因此,人们对寻找具有通用性的机制以及执行各种任务的程序之间的共同组件产生了浓厚的兴趣。这种搜索将理论从最初的符号系统假设扩展到对人工智能中有效的特定符号系统类型的更完整的描述。在本文的第二部分,我们将讨论第二个特定级别的假设的一个例子:启发式搜索假设。
Much more has been going on, of course, than simply a piling up of examples of intelligent systems adapted to specific task domains. It would be surprising and unappealing if it turned out that the AI programs performing these diverse tasks had nothing in common beyond their being instances of physical symbol systems. Hence, there has been great interest in searching for mechanisms possessed of generality, and for common components among programs performing a variety of tasks. This search carries the theory beyond the initial symbol-system hypothesis to a more complete characterization of the particular kinds of symbol systems that are effective in artificial intelligence. In the second section of this paper, we will discuss one example of an hypothesis at this second level of specificity: the heuristic-search hypothesis.
对通用性的探索催生了一系列旨在将通用问题解决机制与特定任务领域的要求区分开来的程序。通用问题求解器 (GPS) 可能是其中的第一个;而它的后代包括 PLANNER 和 CONNIVER 等当代系统。对共同组件的探索导致了目标和计划的通用表示方案、构建判别网络的方法、树搜索控制程序、模式匹配机制和语言解析系统。目前正在进行实验,以寻找用于表示时间和时态、运动、因果关系等序列的便捷设备。越来越多地,人们能够以模块化的方式从这些基本组件组装大型智能系统。
The search for generality spawned a series of programs designed to separate out general problem-solving mechanisms from the requirements of particular task domains. The General Problem Solver (GPS) was perhaps the first of these; while among its descendants are such contemporary systems as PLANNER and CONNIVER. The search for common components has led to generalized schemes of representations for goals and plans, methods for constructing discrimination nets, procedures for the control of tree-search, pattern-matching mechanisms, and language-parsing systems. Experiments are at present under way to find convenient devices for representing sequences of time and tense, movement, causality, and the like. More and more, it becomes possible to assemble large intelligent systems in a modular way from such basic components.
我们可以通过再次以细菌理论为例来了解正在发生的事情。如果该理论引发的第一批研究主要在于寻找与每种疾病相关的细菌,那么随后的努力则转向了解细菌是什么——在基本定性定律的基础上建立一个新的结构层次。在人工智能领域,最初的研究热潮旨在为各种几乎随机选择的任务构建智能程序,而现在,这些研究正让位于更有针对性的研究,旨在了解此类系统的共同机制。
We can gain some perspective on what is going on by turning, again, to the analogy of the germ theory. If the first burst of research stimulated by that theory consisted largely in finding the germ to go with each disease, subsequent effort turned to learning what a germ was—to building on the basic qualitative law a new level of structure. In artificial intelligence, an initial burst of activity aimed at building intelligent programs for a wide variety of almost randomly selected tasks is giving way to more sharply targeted research aimed at understanding the common mechanisms of such systems.
人类符号行为的建模。符号系统假说认为,人类的符号行为产生于人类具有物理符号系统的特征。因此,用符号系统建模人类行为的努力成果成为该假说的重要证据,人工智能研究与通常所说的信息处理心理学研究密切相关。
THE MODELING OF HUMAN SYMBOLIC BEHAVIOR. The symbol-system hypothesis implies that the symbolic behavior of man arises because he has the characteristics of a physical symbol system. Hence, the results of efforts to model human behavior with symbol systems become an important part of the evidence for the hypothesis, and research in artificial intelligence goes on in close collaboration with research in information-processing psychology, as it is usually called.
过去二十年来,人们一直在努力用符号系统解释人类的智能行为,并取得了很大成功,以至于信息处理理论已成为当代认知心理学的主导观点。特别是在问题解决、概念获得和长期记忆等领域,符号操作模型现在占据主导地位。
The search for explanations of man’s intelligent behavior in terms of symbol systems has had a large measure of success over the past twenty years—to the point where information-processing theory is the leading contemporary point of view in cognitive psychology. Especially in the areas of problem solving, concept attainment, and long-term memory, symbol-manipulation models now dominate the scene.
信息处理心理学的研究主要涉及两种类型的实证活动。第一种是观察和实验人类在需要智能的任务中的行为。第二种与人工智能中的并行活动非常相似,是对符号系统进行编程以模拟观察到的人类行为。心理观察和实验导致形成关于受试者正在使用的符号过程的假设,这些假设是程序构建中思想的重要来源。因此,GPS 基本机制的许多想法都来自对人类受试者在执行解决问题任务时大声思考时产生的协议的仔细分析。
Research in information-processing psychology involves two main kinds of empirical activity. The first is the conduct of observations and experiments on human behavior in tasks requiring intelligence. The second, very similar to the parallel activity in artificial intelligence, is the programming of symbol systems to model the observed human behavior. The psychological observations and experiments lead to the formulation of hypotheses about the symbolic processes the subjects are using, and these are an important source of the ideas that go into the construction of the programs. Thus many of the ideas for the basic mechanisms of GPS were derived from careful analysis of the protocols that human subjects produced while thinking aloud during the performance of a problem-solving task.
计算机科学的经验主义特征在与心理学的结合中表现得最为明显。不仅需要进行心理学实验来检验真实性模拟模型不能作为人类行为的解释,但实验为物理符号系统的设计和构建带来了新的想法。
The empirical character of computer science is nowhere more evident than in this alliance with psychology. Not only are psychological experiments required to test the veridicality of the simulation models as explanations of the human behavior, but out of the experiments come new ideas for the design and construction of physical symbol systems.
其他证据。我们尚未考虑的符号系统假说的主要证据是负面证据:缺乏关于智能活动如何完成的具体竞争性假设——无论是由人还是由机器完成。建立此类假设的大多数尝试都发生在心理学领域。在这里,我们有一系列理论,从通常被称为“行为主义”的观点到通常被称为“格式塔理论”的观点。这两种观点都不是符号系统假说的真正竞争对手,原因有二。首先,行为主义和格式塔理论都没有证明,甚至没有展示如何证明,它所假设的解释机制足以解释复杂任务中的智能行为。其次,这两种理论都没有像人工程序那样具有特异性。事实上,替代理论非常模糊,因此给它们提供信息处理解释并将它们同化为符号系统假说并不困难。
OTHER EVIDENCE. The principal body of evidence for the symbol-system hypothesis that we have not considered is negative evidence: the absence of specific competing hypotheses as to how intelligent activity might be accomplished—whether by man or by machine. Most attempts to build such hypotheses have taken place within the field of psychology. Here we have had a continuum of theories from the points of view usually labeled “behaviorism” to those usually labeled “Gestalt theory”. Neither of these points of view stands as a real competitor to the symbol-system hypothesis, and for two reasons. First, neither behaviorism nor Gestalt theory has demonstrated, or even shown how to demonstrate, that the explanatory mechanisms it postulates are sufficient to account for intelligent behavior in complex tasks. Second, neither theory has been formulated with anything like the specificity of artificial programs. As a matter of fact, the alternative theories are so vague that it is not terribly difficult to give them information-processing interpretations, and thereby assimilate them to the symbol-system hypothesis.
我们试图用物理符号系统假说的例子来具体说明计算机科学是一门科学事业,其通常含义是:它提出科学假说,然后通过实证研究来验证。然而,我们选择这个特定的例子来说明我们的观点还有第二个原因。物理符号系统假说本身就是一个实质性的科学假说,我们之前称之为“定性结构定律”。它代表了计算机科学的一项重要发现,如果得到经验证据的证实(事实上似乎正在发生),将对该领域产生重大的持续影响。
We have tried to use the example of the Physical Symbol System Hypothesis to illustrate concretely that computer science is a scientific enterprise in the usual meaning of that term: it develops scientific hypotheses which it then seeks to verify by empirical inquiry. We had a second reason, however, for choosing this particular example to illustrate our point. The Physical Symbol System Hypothesis is itself a substantial scientific hypothesis of the kind that we earlier dubbed “laws of qualitative structure”. It represents an important discovery of computer science, which if borne out by the empirical evidence, as in fact appears to be occurring, will have major continuing impact on the field.
现在我们来看第二个例子,即搜索在智能中的作用。这个主题以及我们将要研究的有关它的特定假设,在计算机科学,尤其是人工智能中也发挥着核心作用。
We turn now to a second example, the role of search in intelligence. This topic, and the particular hypothesis about it that we shall examine, have also played a central role in computer science, in general, and artificial intelligence, in particular.
知道物理符号系统为智能行为提供了矩阵,并不能告诉我们它们是如何实现这一点的。我们在计算机科学中给出的第二个定性结构定律的例子解决了后一个问题,它断言符号系统通过使用启发式搜索过程来解决问题。这一概括与前一个概括一样,基于经验证据,并没有从其他前提中正式推导出来。然而,我们马上就会看到,它确实与符号系统假设有着某种逻辑上的联系,也许我们可以期待在未来的某个时候正式确定这种联系。在那一天到来之前,我们的故事必须再次成为实证探究的故事。我们将描述有关启发式搜索的已知知识,并回顾表明它如何使行动变得智能的实证研究结果。我们首先陈述这一定性结构定律,即启发式搜索假设。
Knowing that physical symbol systems provide the matrix for intelligent action does not tell us how they accomplish this. Our second example of a law of qualitative structure in computer science addresses this latter question, asserting that symbol systems solve problems by using the processes of heuristic search. This generalization, like the previous one, rests on empirical evidence, and has not been derived formally from other premises. We shall see in a moment, however, that it does have some logical connection with the symbol-system hypothesis, and perhaps we can expect to formalize the connection at some time in the future. Until that time arrives, our story must again be one of empirical inquiry. We will describe what is known about heuristic search and review the empirical findings that show how it enables action to be intelligent. We begin by stating this law of qualitative structure, the heuristic-search hypothesis.
问题搜索假设。问题的解决方案以符号结构表示。物理符号系统通过搜索来发挥其解决问题的智能——即通过生成并逐步修改符号结构,直到产生解决方案结构。
HEURISTIC-SEARCH HYPOTHESIS. The solutions to problems are represented as symbol structures. A physical symbol system exercises its intelligence in problem solving by search—that is, by generating and progressively modifying symbol structures until it produces a solution structure.
物理符号系统必须使用启发式搜索来解决问题,因为此类系统的处理资源有限;在有限数量的步骤和有限的时间间隔内,它们只能执行有限数量的进程。当然,这不是一个很强的限制,因为所有通用图灵机都受此影响。然而,我们想从更强烈的意义上来理解这种限制:我们指的是实际限制。我们可以设想一些系统,它们不受实际限制,但能够以恒定的速率并行搜索指数扩展树的节点,每个单位的深度都会增加。我们在这里不讨论这样的系统,而是讨论那些计算资源相对于所面临的情况的复杂性而稀缺的系统。这种限制不会排除任何实际任务环境中的计算机或人类符号系统。资源有限的事实使我们在大多数情况下可以将符号系统视为一个串行的、一次处理一个进程的设备。如果它在任何短时间间隔内只能完成少量处理,那么我们不妨将其视为一次只做一件事。因此,“有限资源符号系统”和“串行符号系统”实际上是同义词。如果时刻足够短,那么分配稀缺资源的问题通常可以视为调度串行机器的问题。
Physical symbol systems must use heuristic search to solve problems because such systems have limited processing resources; in a finite number of steps, and over a finite interval of time, they can execute only a finite number of processes. Of course, that is not a very strong limitation, for all universal Turing machines suffer from it. We intend the limitation, however, in a stronger sense: we mean practically limited. We can conceive of systems that are not limited in a practical way but are capable, for example, of searching in parallel the nodes of an exponentially expanding tree at a constant rate for each unit advance in depth. We will not be concerned here with such systems, but with systems whose computing resources are scarce relative to the complexity of the situations with which they are confronted. The restriction will not exclude any real symbol systems, in computer or man, in the context of real tasks. The fact of limited resources allows us, for most purposes, to view a symbol system as though it were a serial, one-process-at-a-time device. If it can accomplish only a small amount of processing in any short time interval, then we might as well regard it as doing things one at a time. Thus “limited resource symbol system” and “serial symbol system” are practically synonymous. The problem of allocating a scarce resource from moment to moment can usually be treated, if the moment is short enough, as a problem of scheduling a serial machine.
由于解决问题的能力通常被视为系统具有智能的主要指标,因此人工智能的大部分历史自然都花在了尝试构建和理解解决问题的系统上。两千年来,哲学家和心理学家一直在讨论解决问题,他们的论述充满了神秘感。如果你认为符号系统解决问题没有什么问题或神秘之处,那么你就是今天的孩子,你的观点自本世纪中叶以来就已形成。柏拉图(以及苏格拉底,据他所说)甚至很难理解如何看待问题,更不用说如何解决问题了。让我们回顾一下他在《美诺篇》中提出难题的方式:
Since ability to solve problems is generally taken as a prime indicator that a system has intelligence, it is natural that much of the history of artificial intelligence is taken up with attempts to build and understand problem-solving systems. Problem solving has been discussed by philosophers and psychologists for two millennia, in discourses dense with a feeling of mystery. If you think there is nothing problematic or mysterious about a symbol system solving problems, you are a child of today, whose views have been formed since mid-century. Plato (and, by his account, Socrates) found difficulty understanding even how problems could be entertained, much less how they could be solved. Let us remind you of how he posed the conundrum in the Meno:
美诺:苏格拉底,你要如何探究你不知道的东西呢?你要提出什么作为探究的主题呢?如果你找到了你想要的东西,你怎么知道这就是你不知道的东西呢?
Meno: And how will you inquire, Socrates, into that which you know not? What will you put forth as the subject of inquiry? And if you find what you want, how will you ever know that this is what you did not know?
为了解决这个难题,柏拉图发明了他著名的回忆理论:当你认为自己正在发现或学习某事时,你实际上只是在回忆你在前世已经知道的东西。如果你觉得这个解释很荒谬,那么现在有一个更简单的解释,基于我们对符号系统的理解。它的一个近似表述是:
To deal with this puzzle, Plato invented his famous theory of recollection: when you think you are discovering or learning something, you are really just recalling what you already knew in a previous existence. If you find this explanation preposterous, there is a much simpler one available today, based upon our understanding of symbol systems. An approximate statement of it is:
陈述问题就是指定 (1)一类符号结构(问题的解决方案)的测试,以及 (2) 符号结构(潜在解决方案)的生成器。解决问题就是使用 (2) 生成满足 (1) 测试的结构。
To state a problem is to designate (1) a test for a class of symbol structures (solutions of the problem), and (2) a generator of symbol structures (potential solutions). To solve a problem is to generate a structure, using (2), that satisfies the test of (1).
如果我们知道我们想要做什么(测试),但不知道该如何做(我们的生成器不能立即生成满足测试的符号结构),那么我们就会遇到问题。符号系统可以陈述和解决问题(有时),因为它可以生成和测试。
We have a problem if we know what we want to do (the test), and if we don’t know immediately how to do it (our generator does not immediately produce a symbol structure satisfying the test). A symbol system can state and solve problems (sometimes) because it can generate and test.
如果这就是解决问题的全部内容,那么为什么不直接生成一个满足测试的表达式呢?事实上,这正是我们在许愿和做梦时所做的。“如果愿望是马,乞丐可能会骑。”但在梦的世界之外,这是不可能的。知道我们将如何测试已经构建的东西,并不意味着我们知道如何构建它——我们有任何生成器可以这样做。
If that is all there is to problem solving, why not simply generate at once an expression that satisfies the test? This is, in fact, what we do when we wish and dream. “If wishes were horses, beggars might ride.” But outside the world of dreams, it isn’t possible. To know how we would test something, once constructed, does not mean that we know how to construct it—that we have any generator for doing so.
例如,众所周知“解决”下棋取胜问题意味着什么。有一个简单的测试来发现获胜位置,即敌方国王的将死测试。在梦境中,人们只需生成一个策略,就可以将死对手的所有反制策略。可惜的是,现有的符号系统(人或机器)都不知道能做到这一点的生成器。相反,国际象棋中的好棋步是通过生成各种替代方案来寻求的,并使用近似的、通常是错误的测量方法来精心评估它们,这些测量方法应该表明某一特定游戏路线通往获胜位置的可能性。有走子生成器,但没有制胜走子生成器。
For example, it is well known what it means to “solve” the problem of playing winning chess. A simple test exists for noticing winning positions, the test for checkmate of the enemy king. In the world of dreams one simply generates a strategy that leads to checkmate for all counter strategies of the opponent. Alas, no generator that will do this is known to existing symbol systems (man or machine). Instead, good moves in chess are sought by generating various alternatives, and painstakingly evaluating them with the use of approximate, and often erroneous, measures that are supposed to indicate the likelihood that a particular line of play is on the route to a winning position. Move generators there are; winning-move generators there are not.
在为问题生成移动生成器之前,必须有一个问题空间:一个符号结构空间,其中可以表示问题情况,包括初始情况和目标情况。移动生成器是将问题空间中的一种情况修改为另一种情况的过程。物理符号系统的基本特征保证它们可以表示问题空间,并且拥有移动生成器。如何在任何具体情况下合成问题空间和适合该情况的移动生成器,这个问题仍然是人工智能研究的前沿问题。
Before there can be a move generator for a problem, there must be a problem space: a space of symbol structures in which problem situations, including the initial and goal situations, can be represented. Move generators are processes for modifying one situation in the problem space into another. The basic characteristics of physical symbol systems guarantee that they can represent problem spaces and that they possess move generators. How, in any concrete situation they synthesize a problem space and move generators appropriate to that situation is a question that is still very much on the frontier of artificial intelligence research.
因此,当符号系统面临问题和问题空间时,它面临的任务是利用其有限的处理资源来生成可能的解决方案,一个接一个,直到找到一个满足问题定义测试的解决方案。如果系统对生成潜在解决方案的顺序有一定的控制,那么最好安排这种生成顺序,以便实际解决方案具有较高的出现的可能性。符号系统会表现出一定的智能,只要它能做到这一点。对于处理资源有限的系统来说,智能在于明智地选择下一步做什么。
The task that a symbol system is faced with, then, when it is presented with a problem and a problem space, is to use its limited processing resources to generate possible solutions, one after another until if finds one that satisfies the problem-defining test. If the system had some control over the order in which potential solutions were generated, then it would be desirable to arrange this order of generation so that actual solutions would have a high likelihood of appearing early. A symbol system would exhibit intelligence to the extent that it succeeded in doing this. Intelligence for a system with limited processing resources consists in making wise choices of what to do next.
在人工智能研究的最初十年左右,问题解决的研究几乎与搜索过程的研究同义。从我们对问题和问题解决的特征描述中,很容易看出为什么会这样。事实上,我们可能会问,情况也可能并非如此。但在我们尝试回答这个问题之前,我们必须进一步探索搜索过程的本质,因为它在那十年的活动中显露出来。
During the first decade or so of artificial-intelligence research, the study of problem solving was almost synonymous with the study of search processes. From our characterization of problems and problem solving, it is easy to see why this was so. In fact, it might be asked whether it could be otherwise. But before we try to answer that question, we must explore further the nature of search processes as it revealed itself during that decade of activity.
从问题空间中提取信息。考虑一组符号结构,其中的一些小子集是给定问题的解决方案。进一步假设解决方案随机分布在整个集合中。这意味着不存在任何信息可以使任何搜索生成器比随机搜索表现更好。那么没有一个符号系统在解决问题时能够表现出比其他系统更高的智能(或更低的智能),尽管一个符号系统可能比另一个符号系统运气更好。
EXTRACTING INFORMATION FROM THE PROBLEM SPACE. Consider a set of symbol structures, some small subset of which are solutions to a given problem. Suppose, further, that the solutions are distributed randomly through the entire set. By this we mean that no information exists that would enable any search generator to perform better than a random search. Then no symbol system could exhibit more intelligence (or less intelligence) than any other in solving the problem, although one might experience better luck than another.
那么,出现智能的一个条件是解决方案的分布并非完全随机,符号结构空间至少表现出一定程度的秩序和模式。第二个条件是符号结构空间中的模式或多或少是可检测的。第三个条件是潜在解决方案的生成器能够根据检测到的模式做出不同的行为。问题空间中必须有信息,符号系统必须能够提取和使用这些信息。让我们首先看一个非常简单的例子,在这个例子中,智能很容易获得。
A condition, then, for the appearance of intelligence is that the distribution of solutions be not entirely random, that the space of symbol structures exhibit at least some degree of order and pattern. A second condition is that the pattern in the space of symbol structures be more or less detectable. A third condition is that the generator of potential solutions be able to behave differentially, depending on what pattern is detected. There must be information in the problem space, and the symbol system must be capable of extracting and using it. Let us look first at a very simple example, where the intelligence is easy to come by.
考虑解决一个简单代数方程的问题:
Consider the problem of solving a simple algebraic equation:
该测试将解决方案定义为任何形式为x = e的表达式,使得ae + b = ce + d。现在,人们可以使用任何可以生成数字的过程作为生成器,然后可以通过代入后一个方程来测试这些数字。我们不会将其称为智能生成器。
The test defines a solution as any expression of the form, x = e, such that ae + b = ce + d. Now, one could use as generator any process that would produce numbers which could then be tested by substituting in the latter equation. We would not call this an intelligent generator.
或者,也可以使用生成器,利用原始方程可以修改的事实(通过在两边加上或减去相同的量,或乘以或除以相同的量),而不会改变其解。但是,当然,我们可以通过将原始表达式与解的形式进行比较,并对方程进行精确的更改以使其解保持不变,同时将其转换为所需形式,从而获得更多信息来指导生成器。这样的生成器可以注意到原始方程的右边有一个不需要的cx ,从两边减去它,然后再次收集项。然后它会注意到左边有一个不需要的b并将其减去。最后,它可以通过除法去掉左边不需要的系数( a − c )。
Alternatively, one could use generators that would make use of the fact that the original equation can be modified—by adding or subtracting equal quantities from both sides, or multiplying or dividing both sides by the same quantity—without changing its solutions. But, of course, we can obtain even more information to guide the generator by comparing the original expression with the form of the solution, and making precisely those changes in the equation that leave its solution unchanged, while at the same time bringing it into the desired form. Such a generator could notice that there was an unwanted cx on the right-hand side of the original equation, subtract it from both sides, and collect terms again. It could then notice that there was an unwanted b on the left-hand side and subtract that. Finally, it could get rid of the unwanted coefficient (a − c) on the left-hand side by dividing.
因此,通过这个现在表现出相当高智能的程序,生成器会产生连续的符号结构,每个符号结构都是通过修改前一个符号结构获得的;修改的目的是减少输入结构形式和测试表达式形式之间的差异,同时保持解决方案的其他条件。
Thus, by this procedure, which now exhibits considerable intelligence, the generator produces successive symbol structures, each obtained by modifying the previous one; and the modifications are aimed at reducing the differences between the form of the input structure and the form of the test expression, while maintaining the other conditions for a solution.
这个简单的例子已经说明了符号系统用于智能问题解决的许多主要机制。首先,每个连续的表达式都不是独立生成的,而是通过修改先前生成的表达式而生成的。其次,修改不是随机的,而是取决于两种信息。它们取决于在整个代数问题中保持不变的信息,并且这些信息内置于生成器本身的结构中:所有表达式的修改都必须保持方程的解不变。它们还取决于每一步都会发生变化的信息:检测当前表达式和所需表达式之间仍然存在的形式差异。实际上,生成器结合了解决方案必须满足的一些测试,因此永远不会生成不满足这些测试的表达式。使用第一种信息可以保证实际上只生成所有可能表达式中的一小部分,但不会丢失该子集中的解表达式。使用第二种信息可以通过一系列近似得到所需的解决方案,采用一种简单形式的手段-目的分析来为搜索提供方向。
This simple example already illustrates many of the main mechanisms that are used by symbol systems for intelligent problem solving. First, each successive expression is not generated independently, but is produced by modifying one produced previously. Second, the modifications are not haphazard, but depend upon two kinds of information. They depend on information that is constant over this whole class of algebra problems, and that is built into the structure of the generator itself: all modifications of expressions must leave the equation’s solution unchanged. They also depend on information that changes at each step: detection of the differences in form that remain between the current expression and the desired expression. In effect, the generator incorporates some of the tests the solution must satisfy, so that expressions that don’t meet these tests will never be generated. Using the first kind of information guarantees that only a tiny subset of all possible expressions is actually generated, but without losing the solution expression from this subset. Using the second kind of information arrives at the desired solution by a succession of approximations, employing a simple form of means-ends analysis to give direction to the search.
引导搜索的信息从何而来并不神秘。我们不必效仿柏拉图,赋予符号系统一个它已经知道解决方案的先前存在。一个中等复杂的生成和测试系统可以做到这一点,而无需调用轮回。
There is no mystery where the information that guided the search came from. We need not follow Plato in endowing the symbol system with a previous existence in which it already knew the solution. A moderately sophisticated generate-and-test system did the trick without invoking reincarnation.
搜索树。简单的代数问题可能看起来是一个不寻常的、甚至是病态的搜索示例。它肯定不是反复试验搜索,因为虽然进行了几次试验,但没有错误。我们更习惯于将解决问题的搜索视为生成部分解决方案可能性的繁茂分支树,这些树在产生解决方案之前可能会增长到数千甚至数百万个分支。因此,如果生成器从它生成的每个表达式中创建B 个新分支,那么树将增长为B D,其中D是其深度。为代数问题生长的树具有其分支数B等于 1的特性。
SEARCH TREES. The simple algebra problem may seem an unusual, even pathological, example of search. It is certainly not trial-and-error search, for though there were a few trials, there was no error. We are more accustomed to thinking of problem-solving search as generating lushly branching trees of partial solution possibilities which may grow to thousands, or even millions, of branches, before they yield a solution. Thus, if from each expression it produces, the generator creates B new branches, then the tree will grow as BD, where D is its depth. The tree grown for the algebra problem had the peculiarity that its branchiness, B, equaled unity.
下棋的程序通常会生成广义搜索树,在某些情况下,搜索树的分支多达一百万个或更多。虽然这个例子可以用来说明我们关于树搜索的观点,但我们应该注意,国际象棋搜索的目的不是生成建议的解决方案,而是评估(测试)它们。对游戏程序的研究方向之一是改进棋盘的表示和棋盘上的移动过程,以加快搜索速度并使搜索成为可能更大的树。当然,这个方向的理由是动态搜索越深入,最终的评估就越准确。另一方面,有充分的经验证据表明,最强大的人类棋手、大师级棋手很少探索超过一百个分支的树。实现这种经济性的原因不是搜索深度不如下棋程序,而是在每个节点上非常稀疏和有选择地分支。这只有在不导致评估恶化的情况下才有可能,方法是将更多的选择性内置到生成器本身中,以便它能够仅选择那些很可能产生有关位置的重要相关信息的分支进行生成。
Programs that play chess typically grow broad search trees, amounting in some cases to a million branches or more. Although this example will serve to illustrate our points about tree search, we should note that the purpose of search in chess is not to generate proposed solutions, but to evaluate (test) them. One line of research into gameplaying programs has been centrally concerned with improving the representation of the chess board, and the processes for making moves on it, so as to speed up search and make it possible to search larger trees. The rationale for this direction, of course, is that the deeper the dynamic search, the more accurate should be the evaluations at the end of it. On the other hand, there is good empirical evidence that the strongest human players, grandmasters, seldom explore trees of more than one hundred branches. This economy is achieved not so much by searching less deeply than do chess-playing programs, but by branching very sparsely and selectively at each node. This is only possible, without causing a deterioration of the evaluations, by having more of the selectivity built into the generator itself, so that it is able to select for generation only those branches which are very likely to yield important relevant information about the position.
这次讨论得出的结论听起来有些自相矛盾,即搜索(连续生成潜在解决方案结构)是符号系统在解决问题时运用智能的一个基本方面,但搜索量并不能衡量所展现的智能量。问题之所以成为问题,并不是因为解决问题需要进行大量搜索,而是因为如果不应用必要的智能水平,则需要进行大量搜索。当试图解决问题的符号系统对该做什么有足够的了解时,它就会直接朝着目标前进;但每当它的知识变得不足时,当它进入未知领域时,它就面临着需要进行大量搜索才能再次找到出路的威胁。
The somewhat paradoxical-sounding conclusion to which this discussion leads is that search—successive generation of potential solution structures—is a fundamental aspect of a symbol system’s exercise of intelligence in problem solving but that the amount of search is not a measure of the amount of intelligence being exhibited. What makes a problem a problem is not that a large amount of search is required for its solution, but that a large amount would be required if a requisite level of intelligence were not applied. When the symbolic system that is endeavoring to solve a problem knows enough about what to do, it simply proceeds directly towards its goal; but whenever its knowledge becomes inadequate, when it enters terra incognita, it is faced with the threat of going through large amounts of search before it finds its way again.
生成问题解决方案的每种方案中都存在搜索树呈指数级爆炸的潜力,这警告我们不要依赖计算机的蛮力——即使是最大和最快的计算机——来弥补其生成器的无知和无选择性。有些人心中仍然周期性地燃起希望,希望能够找到一台速度足够快、编程足够巧妙的计算机,通过蛮力搜索下好棋。关于国际象棋游戏的理论知识还不足以排除这种可能性。但是,对大型树中搜索管理的实证研究结果并不理想,这使得这一方向的前景远不如国际象棋首次被选为人工智能的合适任务时那么光明。我们必须将此视为国际象棋程序研究的重要实证发现之一。
The potential for the exponential explosion of the search tree that is present in every scheme for generating problem solutions warns us against depending on the brute force of computers—even the biggest and fastest computers—as a compensation for the ignorance and unselectivity of their generators. The hope is still periodically ignited in some human breasts that a computer can be found that is fast enough, and that can be programmed cleverly enough, to play good chess by brute-force search. There is nothing known in theory about the game of chess that rules out this possibility. But empirical studies on the management of search in sizable trees with only modest results make this a much less promising direction than it was when chess was first chosen as an appropriate task for artificial intelligence. We must regard this as one of the important empirical findings of research with chess programs.
智能的形式。智能的任务就是避免搜索量呈指数级爆炸式增长这一无时不在的威胁。如何实现这一目标?第一种方法是让生成器具有选择性:只生成有望成为解决方案或沿着解决方案路径前进的结构,这已在代数示例和国际象棋程序中得到说明。这样做的通常结果是降低分支率,而不是完全阻止它。最终的指数级爆炸无法避免(除非在代数示例等结构极其复杂的情形中),只能被推迟。因此,智能系统通常需要使用其他信息使用技术来补充其解决方案生成器的选择性以指导搜索。
THE FORMS OF INTELLIGENCE. The task of intelligence, then, is to avert the ever-present threat of the exponential explosion of search. How can this be accomplished? The first route, already illustrated by the algebra example and by chess programs that only generate “plausible” moves for further analysis, is to build selectivity into the generator: to generate only structures that show promise of being solutions or of being along the path toward solutions. The usual consequence of doing this is to decrease the rate of branching, not to prevent it entirely. Ultimate exponential explosion is not avoided—save in exceptionally highly structured situations like the algebra example—but only postponed. Hence, an intelligent system generally needs to supplement the selectivity of its solution generator with other information-using techniques to guide search.
二十年来,我们在各种任务环境中管理树搜索的经验已经产生了一套通用技术,这些技术是当今每位人工智能研究人员必备的装备。由于这些技术已在 Nilsson (1971) 等一般著作中描述过,因此这里可以非常简要地总结一下。
Twenty years of experience with managing tree search in a variety of task environments has produced a small kit of general techniques which is part of the equipment of every researcher in artificial intelligence today. Since these techniques have been described in general works like that of Nilsson (1971), they can be summarized very briefly here.
在串行启发式搜索中,基本问题始终是:下一步该做什么?在树搜索中,该问题又包含两个部分:(1) 我们下一步应从树中的哪个节点搜索,(2) 我们应从该节点采取什么方向?有助于回答第一个问题的信息可以解释为测量不同节点与目标之间的相对距离。最佳优先搜索要求从看起来最接近目标的节点开始进行下一步搜索。有助于回答第二个问题(即朝哪个方向搜索)的信息通常可以通过检测当前节点结构与解决方案测试所描述的目标结构之间的具体差异,并选择与减少这些特定差异相关的操作来获得,就像在代数示例中一样。这就是所谓的手段-目的分析技术,它在通用问题求解器的结构中起着核心作用。
In serial heuristic search, the basic question always is: What shall be done next? In tree search, that question, in turn, has two components: (1) From what node in the tree shall we search next, and (2) What direction shall we take from that node? Information helpful in answering the first question may be interpreted as measuring the relative distance of different nodes from the goal. Best-first search calls for searching next from the node that appears closest to the goal. Information helpful in answering the second question—in what direction to search—is often obtained, as in the algebra example, by detecting specific differences between the current nodal structure and the goal structure described by the test of a solution, and selecting actions that are relevant to reducing these particular kinds of differences. This is the technique known as means-ends analysis, which plays a central role in the structure of the General Problem Solver.
通过追溯大量问题求解程序中这两个核心思想的历史,可以清楚地证明实证研究作为人工智能研究一般思想来源的重要性:最佳优先搜索和手段-目的分析。最佳优先搜索的雏形早在 1955 年的《逻辑理论家》中就已经出现,尽管没有命名。体现手段-目的分析的通用问题求解器出现于 1957 年左右,但结合了改进的深度优先搜索而不是最佳优先搜索。出于节省内存的原因,国际象棋程序通常与深度优先搜索相结合,大约在 1958 年之后,强大的 alpha-beta 剪枝程序对其进行了补充。这些技术似乎都被重新发明了多次,直到 20 世纪 60 年代中后期,很难找到使用这些概念进行一般的、与任务无关的、理论性的问题求解讨论。他们从数学理论中获得的正式支持仍然微乎其微:一些关于使用 alpha-beta 启发式方法可以减少搜索量的定理、几个关于最短路径搜索的定理(由 Nilsson,1971 年审查),以及一些关于使用概率评估函数的最佳优先搜索的最新定理。
The importance of empirical studies as a source of general ideas in AI research can be demonstrated clearly by tracing the history, through large numbers of problem-solving programs, of these two central ideas: best-first search and means-ends analysis. Rudiments of best-first search were already present, though unnamed, in the Logic Theorist in 1955. The General Problem Solver, embodying means-ends analysis, appeared about 1957—but combined it with modified depth-first search rather than best-first search. Chess programs were generally wedded, for reasons of economy of memory, to depth-first search, supplemented after about 1958 by the powerful alpha-beta pruning procedure. Each of these techniques appears to have been reinvented a number of times, and it is hard to find general, task-independent, theoretical discussions of problem-solving in terms of these concepts until the middle or late l960’s. The amount of formal buttressing they have received from mathematical theory is still minuscule: some theorems about the reduction in search that can be secured from using the alpha-beta heuristic, a couple of theorems (reviewed by Nilsson, 1971) about shortest-path search, and some very recent theorems on best-first search with a probabilistic evaluation function.
“弱”方法和“强”方法。我们讨论的技术致力于控制指数扩张,而不是防止指数扩张。因此,它们被恰当地称为“弱方法”——当符号系统的知识或问题空间中实际包含的结构量不足以完全避免搜索时使用的方法。将高度结构化的情况(可以表述为线性规划问题)与组合问题中结构性较差的情况,如旅行商问题或调度问题。(“结构性较差”在这里指的是问题空间结构的相关理论不足或不存在。)
“WEAK” AND “STRONG” METHODS. The techniques we have been discussing are dedicated to the control of exponential expansion rather than its prevention. For this reason, they have been properly called “weak methods”—methods to be used when the symbol system’s knowledge or the amount of structure actually contained in the problem space are inadequate to permit search to be avoided entirely. It is instructive to contrast a highly-structured situation, which can be formulated, say, as a linear-programming problem, with the less-structured situations of combinatorial problems like the traveling-salesman problem or scheduling problems. (“Less structured” here refers to the insufficiency or nonexistence of relevant theory about the structure of the problem space.)
在解决线性规划问题时,可能需要进行大量计算,但搜索不会出现分支。每一步都是通往解决方案的一步。在解决组合问题或证明定理时,树搜索几乎是不可避免的,成功取决于我们所描述的那种启发式搜索方法。
In solving linear-programming problems, a substantial amount of computation may be required, but the search does not branch. Every step is a step along the way to a solution. In solving combinatorial problems or in proving theorems, tree search can seldom be avoided, and success depends on heuristic search methods of the sort we have been describing.
并非所有人工智能问题解决研究都遵循了我们概述的路径。定理证明系统的研究提供了一个略有不同的观点的例子。在这里,从数学和逻辑中引入的思想对研究方向产生了强烈的影响。例如,当无法证明完整性属性时,人们会抵制使用启发式方法(这有点讽刺,因为大多数有趣的数学系统都是不可判定的)。由于很少能证明最佳优先搜索启发式方法或多种选择性生成器的完整性,因此这一要求的影响相当令人沮丧。当定理证明程序因其搜索树的组合爆炸而不断无能为力时,人们开始考虑选择性启发式方法,在许多情况下,选择性启发式方法被证明是一般问题解决程序中使用的启发式方法的类似物。例如,支持集启发式方法是一种逆向工作形式,适用于解析定理证明环境。
Not all streams of AI problem-solving research have followed the path we have been outlining. An example of a somewhat different point is provided by the work on theorem-proving systems. Here, ideas imported from mathematics and logic have had a strong influence on the direction of inquiry. For example, the use of heuristics was resisted when properties of completeness could not be proved (a bit ironic, since most interesting mathematical systems are known to be undecidable). Since completeness can seldom be proved for best-first search heuristics, or for many kinds of selective generators, the effect of this requirement was rather inhibiting. When theorem-proving programs were continually incapacitated by the combinatorial explosion of their search trees, thought began to be given to selective heuristics, which in many cases proved to be analogues of heuristics used in general problem-solving programs. The set-of-support heuristic, for example, is a form of working backward, adapted to the resolution theorem-proving environment.
经验总结。我们现在已经描述了定性结构第二定律的工作原理,该定律断言物理符号系统通过启发式搜索解决问题。除此之外,我们还研究了启发式搜索的一些次要特征,特别是它始终面临的搜索树指数爆炸的威胁,以及它用来避免这种威胁的一些手段。对于启发式搜索作为一种解决问题的机制有多有效,人们的看法不一——这些看法取决于考虑的任务领域和采用的充分性标准。将期望水平设定得低可以保证成功,而将期望水平设定得高则会导致失败。证据可以总结如下:很少有程序能以“专家”专业水平解决问题。塞缪尔的检查程序和费根鲍姆和莱德伯格的 DENDRAL 可能是最著名的例外,但人们也可以指出许多启发式搜索程序,用于调度和整数规划等运筹学问题领域。在许多领域,程序的表现都与业余爱好者相当:国际象棋、某些定理证明领域、多种游戏和谜题。具有复杂感知“前端”的程序还远未达到人类的水平:视觉场景识别器、语音理解器、必须在真实空间和时间中移动的机器人。尽管如此,这些困难任务已经取得了令人瞩目的进展,并积累了大量经验。
A SUMMARY OF THE EXPERIENCE. We have now described the workings of our second law of qualitative structure, which asserts that physical symbol systems solve problems by means of heuristic search. Beyond that, we have examined some subsidiary characteristics of heuristic search, in particular the threat that it always faces of exponential explosion of the search tree, and some of the means it uses to avert that threat. Opinions differ as to how effective heuristic search has been as a problem-solving mechanism—the opinions depending on what task domains are considered and what criterion of adequacy is adopted. Success can be guaranteed by setting aspiration levels low—or failure by setting them high. The evidence might be summed up about as follows: Few programs are solving problems at “expert” professional levels. Samuel’s checker program and Feigenbaum and Lederberg’s DENDRAL are perhaps the best-known exceptions, but one could point also to a number of heuristic search programs for such operations-research problem domains as scheduling and integer programming. In a number of domains, programs perform at the level of competent amateurs: chess, some theorem-proving domains, many kinds of games and puzzles. Human levels have not yet been nearly reached by programs that have a complex perceptual “front end”: visual-scene recognizers, speech understanders, robots that have to maneuver in real space and time. Nevertheless, impressive progress has been made, and a large body of experience assembled about these difficult tasks.
对于出现的特定绩效模式,我们并没有深刻的理论解释。然而,从实证的角度,我们可以得出两个结论。首先,从人类专家在国际象棋等任务中的表现来看,任何能够匹敌这种表现的系统都必须在其内存中存储大量语义信息。其次,人类在感知成分较大的任务中的优势部分可以归因于人眼和耳朵内置的特殊并行处理结构。
We do not have deep theoretical explanations for the particular pattern of performance that has emerged. On empirical grounds, however, we might draw two conclusions. First, from what has been learned about human expert performance in tasks like chess, it is likely that any system capable of matching that performance will have to have access, in its memories, to very large stores of semantic information. Second, some part of the human superiority in tasks with a large perceptual component can be attributed to the special-purpose built-in parallel-processing structure of the human eye and ear.
无论如何,性能的质量必然取决于问题领域和用于解决问题的符号系统的特征。对于我们感兴趣的大多数现实领域,迄今为止,领域结构还不够简单,无法产生关于复杂性的定理,或者无法告诉我们(除了经验之外)现实世界的问题与我们的符号系统解决这些问题的能力之间的关系有多大。这种情况可能会改变,但在改变之前,我们必须依靠经验探索,使用我们知道如何构建的最佳问题求解器,作为了解问题难度大小和特征的主要知识来源。即使在线性规划等高度结构化的领域,理论在加强最强大的解决方案算法所依赖的启发式方法方面也比在提供复杂性的深入分析方面更有用。
In any case, the quality of performance must necessarily depend on the characteristics both of the problem domains and of the symbol systems used to tackle them. For most real-life domains in which we are interested, the domain structure has so far not proved sufficiently simple to yield theorems about complexity, or to tell us, other than empirically, how large real-world problems are in relation to the abilities of our symbol systems to solve them. That situation may change, but until it does, we must rely upon empirical explorations, using the best problem solvers we know how to build, as a principal source of knowledge about the magnitude and characteristics of problem difficulty. Even in highly structured areas like linear programming, theory has been much more useful in strengthening the heuristics that underlie the most powerful solution algorithms than in providing a deep analysis of complexity.
我们对智能的分析将其等同于提取和使用有关问题空间结构的信息的能力,以便能够尽可能快速和直接地生成问题解决方案。因此,改进符号系统解决问题能力的新方向可以等同于提取和使用信息的新方法。至少可以确定三种这样的方法。
Our analysis of intelligence equated it with ability to extract and use information about the structure of the problem space, so as to enable a problem solution to be generated as quickly and directly as possible. New directions for improving the problem-solving capabilities of symbol systems can be equated, then, with new ways of extracting and using information. At least three such ways can be identified.
信息的非局部使用。首先,一些研究人员注意到,在树搜索过程中收集的信息通常仅在局部使用,以帮助在生成信息的特定节点做出决策。通过动态分析连续子树获得的有关棋局的信息通常仅用于评估该位置,而不是评估可能包含许多相同特征的其他位置。因此,必须在搜索树的不同节点反复重新发现相同的事实。简单地将信息从其出现的上下文中取出并使用它通常不能解决问题,因为信息可能仅在有限范围内的上下文中有效。近年来,已经进行了一些探索性努力,将信息从其原始上下文传输到其他适当的上下文。虽然现在评估这个想法的力量还为时过早,甚至无法确切地评估如何实现它,但它显示出了相当大的前景。柏林纳 (1975) 一直在进行的一个重要研究方向是使用因果分析来确定特定信息的有效范围。因此,如果国际象棋位置上的弱点可以追溯到造成这一弱点的走法,那么可以预料到,在由同一走法衍生的其他位置上也会出现同样的弱点。
NONLOCAL USE OF INFORMATION. First, it has been noted by several investigators that information gathered in the course of tree search is usually only used locally, to help make decisions at the specific node where the information was generated. Information about a chess position, obtained by dynamic analysis of a subtree of continuations, is usually used to evaluate just that position, not to evaluate other positions that may contain many of the same features. Hence, the same facts have to be rediscovered repeatedly at different nodes of the search tree. Simply to take the information out of the context in which it arose and use it generally does not solve the problem, for the information may be valid only in a limited range of contexts. In recent years, a few exploratory efforts have been made to transport information from its context of origin to other appropriate contexts. While it is still too early to evaluate the power of this idea, or even exactly how it is to be achieved, it shows considerable promise. An important line of investigation that Berliner (1975) has been pursuing is to use causal analysis to determine the range over which a particular piece of information is valid. Thus if a weakness in a chess position can be traced back to the move that made it, then the same weakness can be expected in other positions descendant from the same move.
HEARSAY 语音理解系统采用了另一种方法,使信息在全球范围内可用。该系统试图通过在多个不同层次上进行并行搜索来识别语音字符串:音位、词汇、句法和语义。当这些搜索中的每一个都提供并评估假设时,它会将所获得的信息提供给一个公共“黑板”,所有来源都可以读取。例如,这种共享信息可用于消除假设,甚至整个假设类别,否则这些假设必须由其中一个进程进行搜索。因此,提高我们非本地使用树搜索信息的能力有望提高问题解决系统的智能。
The HEARSAY speech understanding system has taken another approach to making information globally available. That system seeks to recognize speech strings by pursuing a parallel search at a number of different levels: phonemic, lexical, syntactic, and semantic. As each of these searches provides and evaluates hypotheses, it supplies the information it has gained to a common “blackboard” that can be read by all the sources. This shared information can be used, for example, to eliminate hypotheses, or even whole classes of hypotheses, that would otherwise have to be searched by one of the processes. Thus increasing our ability to use tree-search information nonlocally offers promise for raising the intelligence of problem-solving systems.
语义识别系统。提高智能的第二个积极可能性是向符号系统提供关于它正在处理的任务领域的丰富语义信息。例如,对国际象棋大师技能的实证研究表明,大师技能的主要来源是存储的信息,这些信息使他能够识别棋盘上大量的特定特征和特征模式,以及利用这种识别来提出适合识别特征的操作的信息。当然,这个总体思路几乎从一开始就被纳入国际象棋程序中。新发现的是,人们意识到大师级比赛可能需要存储的此类模式和相关信息的数量:大约 50,000 个。
SEMANTIC RECOGNITION SYSTEMS. A second active possibility for raising intelligence is to supply the symbol system with a rich body of semantic information about the task domain it is dealing with. For example, empirical research on the skill of chess masters shows that a major source of the master’s skill is stored information that enables him to recognize a large number of specific features and patterns of features on a chess board, and information that uses this recognition to propose actions appropriate to the features recognized. This general idea has, of course, been incorporated in chess programs almost from the beginning. What is new is the realization of the number of such patterns and associated information that may have to be stored for master-level play: something on the order of 50,000.
用识别代替搜索的可能性之所以存在,是因为一个特定的模式,尤其是一个罕见的模式,如果与问题空间的结构紧密相关,那么它就可以包含大量的信息。当该结构是“不规则的”且无法用简单的数学描述时,掌握大量相关模式可能是智能行为的关键。在任何特定任务领域中是否如此,这个问题更容易通过实证研究而不是理论来解决。我们对富含语义信息和模式识别能力的符号系统的经验仍然极其有限。
The possibility of substituting recognition for search arises because a particular, and especially a rare, pattern can contain an enormous amount of information, provided that it is closely linked to the structure of the problem space. When that structure is “irregular”, and not subject to simple mathematical description, then knowledge of a large number of relevant patterns may be the key to intelligent behavior. Whether this is so in any particular task domain is a question more easily settled by empirical investigation than by theory. Our experience with symbol systems richly endowed with semantic information and pattern-recognizing capabilities for accessing it is still extremely limited.
以上讨论特指与识别系统相关的语义信息。当然,在人工智能领域,还有一大批关于语义信息处理和语义记忆组织的研究,这些研究超出了本文讨论的范围。
The discussion above refers specifically to semantic information associated with a recognition system. Of course, there is also a whole large area of AI research on semantic information processing and the organization of semantic memories that falls outside the scope of the topics we are discussing in this paper.
选择适当的表示。第三条研究线索是,通过选择适当的问题空间,是否可以减少或避免搜索。一个典型的例子可以生动地说明这种可能性,那就是残缺棋盘问题。一个标准的 64 方格棋盘可以恰好用 32 块瓷砖覆盖,每块瓷砖都是一个 1×2 的矩形,恰好覆盖两个方格。现在假设我们在棋盘的两个对角处切掉方格,剩下总共 62 个方格。这个残缺的棋盘能恰好用 31 块瓷砖覆盖吗?凭借(字面上的)天赐的耐心,可以通过尝试以下方法证明实现这种覆盖是不可能的:所有可能的排列。对于那些耐心较少但智力较高的人而言,另一种方法是观察棋盘对角线两侧的角是同一种颜色。因此,残缺的棋盘上一种颜色的方格比另一种颜色的方格少两个。但每块瓷砖覆盖一种颜色的一块方格和另一种颜色的一块方格,任何一组瓷砖都必须覆盖相同数量的每种颜色的方格。因此,没有解决方案。符号系统如何发现这个简单的归纳论证,作为通过搜索所有可能的覆盖来解决问题的无望尝试的替代方案?我们会给找到解决方案的系统高分,因为它很聪明。
SELECTING APPROPRIATE REPRESENTATIONS. A third line of inquiry is concerned with the possibility that search can be reduced or avoided by selecting an appropriate problem space. A standard example that illustrates this possibility dramatically is the mutilated-checkerboard problem. A standard 64-square checker board can be covered exactly with 32 tiles, each a 1 × 2 rectangle covering exactly two squares. Suppose, now, that we cut off squares at two diagonally opposite corners of the checkerboard, leaving a total of 62 squares. Can this mutilated board be covered exactly with 31 tiles? With (literally) heavenly patience, the impossibility of achieving such a covering can be demonstrated by trying all possible arrangements. The alternative, for those with less patience and more intelligence, is to observe that the two diagonally opposite corners of a checkerboard are of the same color. Hence, the mutilated checkerboard has two fewer squares of one color than of the other. But each tile covers one square of one color and one square of the other, and any set of tiles must cover the same number of squares of each color. Hence, there is no solution. How can a symbol system discover this simple inductive argument as an alternative to a hopeless attempt to solve the problem by search among all possible coverings? We would award a system that found the solution high marks for intelligence.
然而,提出这个问题也许并没有让我们逃避搜索过程。我们只是将搜索从可能的问题解决方案空间转移到了可能的表示空间。无论如何,从一种表示转移到另一种表示以及发现和评估表示的整个过程在问题解决研究领域基本上是未开发的领域。支配表示的定性结构规律仍有待发现。对它们的搜索几乎肯定会在未来十年受到广泛关注。
Perhaps, however, in posing this problem we are not escaping from search processes. We have simply displaced the search from a space of possible problems solutions to a space of possible representations. In any event, the whole process of moving from one representation to another, and of discovering and evaluating representations, is largely unexplored territory in the domain of problem-solving research. The laws of qualitative structure governing representations remain to be discovered. The search for them is almost sure to receive considerable attention in the coming decade.
这就是我们对符号系统和智能的描述。从柏拉图的《美诺篇》到现在,这是一条漫长的道路,但令人鼓舞的是,这条道路上的大部分进展都是在 20 世纪初取得的,其中很大一部分是在 20 世纪中叶取得的。在现代形式逻辑将思想解释为对形式标记的操纵之前,思想仍然是完全无形和不可言喻的。在计算机教会我们机器如何处理符号之前,它似乎仍然主要存在于柏拉图思想的天堂或人类思维同样模糊的空间中。AM Turing 在本世纪中叶从现代逻辑到计算机的发展的十字路口做出了巨大贡献。
That is our account of symbol systems and intelligence. It has been a long road from Plato’s Meno to the present, but it is perhaps encouraging that most of the progress along that road has been made since the turn of the twentieth century, and a large fraction of it since the mid-point of the century. Thought was still wholly intangible and ineffable until modern formal logic interpreted it as the manipulation of formal tokens. And it seemed still to inhabit mainly the heaven of Platonic ideas, or the equally obscure spaces of the human mind, until computers taught us how symbols could be processed by machines. A. M. Turing made his great contributions at the mid-century crossroads of these developments that led from modern logic to the computer.
物理符号系统。逻辑和计算机的研究向我们揭示了智能存在于物理符号系统中。这是计算机科学最基本的定性结构规律。
PHYSICAL SYMBOL SYSTEMS. The study of logic and computers has revealed to us that intelligence resides in physical-symbol systems. This is computer science’s most basic law of qualitative structure.
符号系统是模式和过程的集合,后者能够产生、破坏和修改前者。模式最重要的特性是它们可以指定对象、过程或其他模式,并且当它们指定过程时,它们可以被解释。解释意味着执行指定的过程。我们熟悉的两类最重要的符号系统是人类和计算机。
Symbol systems are collections of patterns and processes, the latter being capable of producing, destroying, and modifying the former. The most important properties of patterns is that they can designate objects, processes, or other patterns, and that when they designate processes, they can be interpreted. Interpretation means carrying out the designated process. The two most significant classes of symbol systems with which we are acquainted are human beings and computers.
正如前面所指出的,我们目前对符号系统的理解经历了一系列阶段。形式逻辑使我们熟悉了符号,将其作为思维的原材料,并让我们熟悉了根据严谨的规则操纵它们的想法。定义的形式化过程。图灵机使符号的句法处理真正机器化,并肯定了严格定义的符号系统的潜在普遍性。计算机的存储程序概念再次肯定了符号的可解释性,这在图灵机中已经隐含。列表处理将符号的表示能力推到了最前沿,并以允许独立于底层物理机器的固定结构的方式定义了符号处理。到 1956 年,所有这些概念以及用于实现它们的硬件都已面世。人工智能主题符号系统智能的研究可以开始了。
Our present understanding of symbol systems grew, as indicated earlier, through a sequence of stages. Formal logic familiarized us with symbols, treated syntactically, as the raw material of thought, and with the idea of manipulating them according to carefully defined formal processes. The Turing machine made the syntactic processing of symbols truly machine-like, and affirmed the potential universality of strictly defined symbol systems. The stored-program concept for computers reaffirmed the interpretability of symbols, already implicit in the Turing machine. List processing brought to the forefront the denotational capacities of symbols and defined symbol processing in ways that allowed independence from the fixed structure of the underlying physical machine. By 1956 all of these concepts were available, together with hardware for implementing them. The study of the intelligence of symbol systems, the subject of artificial intelligence, could begin.
启发式搜索。人工智能定性结构的第二条定律是,符号系统通过生成潜在解决方案并对其进行测试(即通过搜索)来解决问题。解决方案通常是通过创建符号表达式并依次修改它们直到它们满足解决方案的条件来寻求的。因此,符号系统通过搜索来解决问题。由于它们的资源有限,搜索不能一次性完成,而必须按顺序进行。它留下从起点到目标的一条路径,或者,如果需要更正和备份,则留下一整棵这样的路径树。
HEURISTIC SEARCH. A second law of qualitative structure of AI is that symbol systems solve problems by generating potential solutions and testing them—that is, by searching. Solutions are usually sought by creating symbolic expressions and modifying them sequentially until they satisfy the conditions for a solution. Hence, symbol systems solve problems by searching. Since they have finite resources, the search cannot be carried out all at once, but must be sequential. It leaves behind it either a single path from starting point to goal or, if correction and backup are necessary, a whole tree of such paths.
当符号系统被纯粹的混乱所包围时,它们不可能表现出智能。它们通过从问题领域中提取信息并利用这些信息指导搜索,避免走错路和绕弯路来发挥智能。问题领域必须包含信息(即某种程度的秩序和结构)才能使该方法奏效。美诺悖论通过以下观察得到解决:信息可以被记住,但新信息也可能从符号指定的领域中提取出来。在这两种情况下,信息的最终来源都是任务领域。
Symbol systems cannot appear intelligent when they are surrounded by pure chaos. They exercise intelligence by extracting information from a problem domain and using that information to guide their search, avoiding wrong turns and circuitous by-paths. The problem domain must contain information—that is, some degree of order and structure—for the method to work. The paradox of the Meno is solved by the observation that information may be remembered, but new information may also be extracted from the domain that the symbols designate. In both cases, the ultimate source of the information is the task domain.
经验基础。人工智能研究关注的是符号系统如何组织起来才能智能地运行。该领域 20 年的工作积累了大量知识,足以写成几本书(已经写完了),其中大部分都是关于特定任务领域中特定类别的符号系统行为的相当具体的经验。然而,从这些经验中,也出现了一些关于智能的一般特征及其实现方法的跨任务领域和系统的概括。
THE EMPIRICAL BASE. Research on artificial intelligence is concerned with how symbol systems must be organized in order to behave intelligently. Twenty years of work in the area has accumulated a considerable body of knowledge, enough to fill several books (it already has), and most of it in the form of rather concrete experience about the behavior of specific classes of symbol systems in specific task domains. Out of this experience, however, there have also emerged some generalizations, cutting across task domains and systems, about the general characteristics of intelligence and its methods of implementation.
我们试图在这里陈述其中的一些概括。它们大多是定性的,而不是数学的。它们更像地质学或进化生物学,而不是理论物理学。它们足够强大,使我们今天能够为相当多的任务领域设计和构建中等智能的系统,以及对人类智能在许多情况下如何运作有相当深入的理解。
We have tried to state some of these generalizations here. They are mostly qualitative rather than mathematical. They have more the flavor of geology or evolutionary biology than the flavor of theoretical physics. They are sufficiently strong to enable us today to design and build moderately intelligent systems for a considerable range of task domains, as well as to gain a rather deep understanding of how human intelligence works in many situations.
下一步是什么?在我们的叙述中,我们提到了悬而未决的问题和已解决的问题;两者都有很多。我们看到过去四分之一世纪围绕这一领域的探索热情丝毫未减。两个资源限制将决定未来一段时间内,计算机技术发展的速度将取决于两个因素。一是可用的计算能力。二是,可能更重要的是,有多少有才华的年轻计算机科学家会被这一研究领域所吸引,因为这是他们能够应对的最具挑战性的研究领域。
WHAT NEXT? In our account we have mentioned open questions as well as settled ones; there are many of both. We see no abatement of the excitement of exploration that has surrounded this field over the past quarter century. Two resource limits will determine the rate of progress over the next such period. One is the amount of computing power that will be available. The second, and probably the more important, is the number of talented young computer scientists who will be attracted to this area of research as the most challenging they can tackle.
AM Turing 在他著名的论文“计算机器和智能”[本卷第 6 章]的结尾写道:
A. M. Turing concluded his famous paper “Computing Machinery and Intelligence” [chapter 6 of this volume] with the words:
我们只能看到前方很短的距离,但我们可以看到有很多事情需要做。
We can only see a short distance ahead, but we can see plenty there that needs to be done.
图灵在 1950 年看到的许多需要做的事情都已经完成了,但议程仍然一如既往地紧凑。也许我们对他上述的简单陈述做了过多的解读,但我们愿意认为图灵在其中认识到了所有计算机科学家本能知道的基本事实。对于所有物理符号系统,尽管我们注定要对问题环境进行连续搜索,但关键问题始终是:下一步该做什么?
Many of the things Turing saw in 1950 that needed to be done have been done, but the agenda is as full as ever. Perhaps we read too much into his simple statement above, but we like to think that in it Turing recognized the fundamental truth that all computer scientists instinctively know. For all physical symbol systems, condemned as we are to serial search of the problem environment, the critical question is always: What to do next?
1.编者注:术语“指示”和“解释”的含义,以及“符号”的含义,是计算机科学所特有的;它们只涉及计算机内部发生的关系和过程。 相反,在语言学和哲学中,这些术语通常用智能系统(或其内部的东西)与其环境之间的关系来解释。本卷中的大多数文章都使用后一种含义。
1. Editor’s note: These senses of the terms ‘designation’ and ‘interpretation’, and hence also of ‘symbol’, are specific to computer science; they concern only relationships and processes that occur within a computer. In linguistics and philosophy, by contrast, these terms would usually be explained in terms of relationships between an intelligent system (or what’s inside of it) and its environment. Most of the essays in the present volume use the terms in this latter sense.
大卫·马尔
David Marr
1982
1982
几乎从来都不能将任何类型的复杂系统理解为对其基本成分属性的简单推断。例如,考虑瓶子中的气体。热力学效应(温度、压力、密度以及这些因素之间的关系)的描述不是通过使用大量方程式(每个相关粒子都有一个方程式)来制定的。这些效应是在它们自己的层面上描述的,即在大量粒子的层面上;努力是为了表明微观和宏观描述在原则上是相互一致的。如果人们希望全面理解像神经系统、发育中的胚胎、一组代谢途径、一瓶气体甚至大型计算机程序这样复杂的系统,那么人们必须准备好考虑不同描述层次上的不同类型的解释,这些解释至少在原则上是联系在一起的,形成一个有凝聚力的整体,即使将各个层次完全详细地联系起来是不切实际的。对于解决信息处理问题的系统的具体情况,还有过程和表示的双重线索,这两个想法都需要讨论。
Almost never can a complex system of any kind be understood as a simple extrapolation from the properties of its elementary components. Consider, for example, some gas in a bottle. A description of thermodynamic effects—temperature, pressure, density and the relationships among these factors—is not formulated by using a large set of equations, one for each of the particles involved. Such effects are described at their own level, that of an enormous collection of particles; the effort is to show that in principle the microscopic and macroscopic descriptions are consistent with one another. If one hopes to achieve a full understanding of a system as complicated as a nervous system, a developing embryo, a set of metabolic pathways, a bottle of gas, or even a large computer program, then one must be prepared to contemplate different kinds of explanation at different levels of description that are linked, at least in principle, into a cohesive whole, even if linking the levels in complete detail is impractical. For the specific case of a system that solves an information-processing problem, there are in addition the twin strands of process and representation, and both these ideas need some discussion.
表示是一种形式化系统,用于明确某些实体或信息类型,以及系统如何做到这一点的规范。我将使用表示来描述给定实体的结果称为该表示中实体的描述(Marr 和 Wishihara,1978 年)。
A representation is a formal system for making explicit certain entities or types of information, together with a specification of how the system does this. And I shall call the result of using a representation to describe a given entity a description of the entity in that representation (Marr and Wishihara, 1978).
例如,阿拉伯数字系统、罗马数字系统和二进制数字系统都是表示数字的形式系统。阿拉伯数字系统由从集合 (0、1、2、3、4、5、6、7、8、9) 中抽取的一串符号组成,而构造特定整数n的描述的规则是将n分解为 10 的幂的倍数之和,并将这些倍数合并为一个字符串,最大幂在左边,最小幂在右边。因此,三十七等于 3 × 10 1 + 7 × 10 0,即阿拉伯数字系统对数字的描述 37。这个描述明确地说明了数字如何分解为 10 的幂。二进制数字系统对三十七这个数字的描述是 100101,而这个描述明确地说明了数字如何分解为 2 的幂。在罗马数字系统中,三十七表示为 XXXVII。
For example, the Arabic, Roman, and binary numeral systems are all formal systems for representing numbers. The Arabic representation consists of a string of symbols drawn from the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), and the rule for constructing the description of a particular integer n is that one decomposes n into a sum of multiples of powers of 10 and unites these multiples into a string with the largest powers on the left and the smallest on the right. Thus, thirty-seven equals 3 × 101 + 7 × 100, which becomes 37, the Arabic numeral system’s description of the number. What this description makes explicit is the number’s decomposition into powers of 10. The binary numeral system’s description of the number thirty-seven is 100101, and this description makes explicit the number’s decomposition into powers of 2. In the Roman numeral system, thirty-seven is represented as XXXVII.
这种表示的定义非常笼统。例如,形状的表示可以是描述形状某些方面的形式化方案,以及指定该方案如何应用于任何特定形状的规则。乐谱提供了一种表示交响乐的方式;字母表允许构建单词的书面表示;等等。“形式化方案”这个短语对于定义至关重要,但读者不应该被它吓到。原因很简单,因为我们处理的是信息处理机器,而这种机器的工作方式是使用符号来代表事物——用我们的术语来说,就是代表事物。说某物是形式化方案,仅仅意味着它是一组符号,具有将它们组合在一起的规则——不多也不少。
This definition of a representation is quite general. For example, a representation for shape would be a formal scheme for describing some aspects of shape, together with rules that specify how the scheme is applied to any particular shape. A musical score provides a way of representing a symphony; the alphabet allows the construction of a written representation of words; and so forth. The phrase “formal scheme” is critical to the definition, but the reader should not be frightened by it. The reason is simply that we are dealing with information-processing machines, and the way such machines work is by using symbols to stand for things—to represent things, in our terminology. To say that something is a formal scheme means only that it is a set of symbols with rules for putting them together—no more and no less.
因此,表示法并非一个陌生的概念——我们一直都在使用表示法。然而,我认为,通过使用符号描述现实,可以捕捉现实的某些方面,并且这样做很有用,这是一个非常迷人和强大的想法。但即使是我们讨论过的简单示例,也引入了一些相当普遍和重要的问题,这些问题在选择使用一种特定表示法时就会出现。例如,如果选择阿拉伯数字表示法,很容易发现一个数字是否是 10 的幂,但很难发现它是否是 2 的幂。如果选择二进制表示法,情况则相反。因此,这是一种权衡;任何特定的表示法都会使某些信息明确化,但代价是信息被推到后台,可能很难恢复。
A representation, therefore, is not a foreign idea at all—we all use representations all the time. However, the notion that one can capture some aspect of reality by making a description of it using a symbol and that to do so can be useful seems to me a fascinating and powerful idea. But even the simple examples we have discussed introduce some rather general and important issues that arise whenever one chooses to use one particular representation. For example, if one chooses the Arabic numeral representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a trade-off; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover.
这个问题很重要,因为信息的表示方式会极大地影响用它做不同事情的难易程度。这一点甚至从我们的数字示例中也可以看出来:如果使用阿拉伯数字或二进制表示,加、减甚至乘都很容易,但用罗马数字做这些事情(尤其是乘法)却一点也不容易。这是罗马文化未能像早期阿拉伯文化那样发展数学的一个关键原因。
This issue is important, because how information is represented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these things—especially multiplication—with Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had.
如今,计算机工程师也面临着类似的问题。电子技术更适合二进制数系统,而不是传统的十进制数系统,但人类提供数据并要求以十进制数计算结果。因此,工程师面临的设计决策是:是否应该支付转换为二进制的成本,以二进制表示形式执行算术,然后在输出时转换回十进制数;还是应该牺牲电路效率,直接以十进制表示形式执行运算?总体而言,商用计算机和袖珍计算器采用第二种方法,通用计算机采用第一种方法。但即使对于给定类型的信息,人们不限于只使用一种表示系统,选择使用哪种系统也很重要,不能掉以轻心。它决定了哪些信息被明确表达,哪些信息被进一步推到幕后,并且它对随后对该信息执行操作的难易程度有着深远的影响。
An analogous problem faces computer engineers today. Electronic technology is much more suited to a binary number system than to the conventional base 10 system, yet humans supply their data and require the results in base 10. The design decision facing the engineer, therefore, is, Should one pay the cost of conversion into base 2, carry out the arithmetic in a binary representation, and then convert back into decimal numbers on output; or should one sacrifice efficiency of circuitry to carry out operations directly in a decimal representation? On the whole, business computers and pocket calculators take the second approach, and general purpose computers take the first. But even though one is not restricted to using just one representation system for a given type of information, the choice of which to use is important and cannot be taken lightly. It determines what information is made explicit and hence what is pushed further into the background, and it has a far-reaching effect on the ease and difficulty with which operations may subsequently be carried out on that information.
“过程”这个术语的含义非常广泛。例如,加法是一个过程,进行傅里叶变换也是一个过程。但泡茶或购物也是如此。出于本书的目的,我想将我们的注意力限制在与执行信息处理任务的机器相关的含义上。因此,让我们深入研究一种简单的此类设备背后的概念,即超市收银台的收银机。
The term process is very broad. For example, addition is a process, and so is taking a Fourier transform. But so is making a cup of tea, or going shopping. For the purposes of this book, I want to restrict our attention to the meanings associated with machines that are carrying out information-processing tasks. So let us examine in depth the notions behind one simple such device, a cash register at the checkout counter of a supermarket.
要理解这种装置,需要从几个层面进行,从三个层面来思考也许是最有用的。最抽象的层面是这种装置做什么以及为什么做。它所做的是算术,所以我们的首要任务是掌握加法理论。加法是一种映射,通常用 + 表示,从数对到单个数;例如,+ 将 (3, 4) 对映射到 7,我将其写成 (3 + 4) → 7 的形式。然而,加法有许多抽象属性。它是交换的:(3 + 4) 和 (4 + 3) 都等于 7;并且具有结合律:3 + (4 + 5) 的和与 (3 + 4) + 5 的和相同。然后有一个独特的元素零,它的添加没有效果:(4 + 0) → 4。同样,对于每个数字,都有一个唯一的“逆”,对于 4 写为 (-4),将它加到数字上得到零:[4 + (-4)] → 0。
There are several levels at which one needs to understand such a device, and it is perhaps most useful to think in terms of three of them. The most abstract is the level of what the device does and why. What it does is arithmetic, so our first task is to master the theory of addition. Addition is a mapping, usually denoted by +, from pairs of numbers into single numbers; for example, + maps the pair (3, 4) to 7, and I shall write this in the form (3 + 4) → 7. Addition has a number of abstract properties, however. It is commutative: both (3 + 4) and (4 + 3) are equal to 7; and associative: the sum of 3 + (4 + 5) is the same as the sum of (3 + 4) + 5. Then there is the unique distinguished element, zero, the adding of which has no effect: (4 + 0) → 4. Also, for every number there is a unique “inverse,” written ( − 4) in the case of 4, which when added to the number gives zero: [4 + (−4)] → 0.
请注意,这些属性是加法基本理论的一部分。无论数字如何书写(无论是二进制阿拉伯数字还是罗马数字),也无论加法如何执行,它们都是正确的。因此,第一层级的一部分可以描述为正在计算的内容。
Notice that these properties are part of the fundamental theory of addition. They are true no matter how the numbers are written—whether in binary Arabic, or Roman representation—and no matter how the addition is executed. Thus part of this first level is something that might be characterized as what is being computed.
这一层次解释的另一半与这样一个问题有关:为什么收银机在合并所购商品的价格以得出最终账单时会执行加法而不是乘法。原因是,我们直觉上认为适合合并各个价格的规则实际上定义了加法的数学运算。这些可以按以下方式表述为约束:
The other half of this level of explanation has to do with the question of why the cash register performs addition and not, for instance, multiplication when combining the prices of the purchased items to arrive at a final bill. The reason is that the rules we intuitively feel to be appropriate for combining the individual prices in fact define the mathematical operation of addition. These can be formulated as constraints in the following way:
1. 如果您什么都不买,就不应该花任何钱;如果您什么都不买而又买了某样东西,那么花的钱应该和只买某样东西的钱一样多。(零的规则。)
1. If you buy nothing, it should cost you nothing; and buying nothing and something should cost the same as buying just the something. (The rules for zero.)
2. 向收银员展示商品的顺序不应影响总额。(交换性。)
2. The order in which goods are presented to the cashier should not affect the total. (Commutativity.)
3. 将货物分成两堆并分别付款不会影响您支付的总金额(结合性;合并价格的基本运算。)
3. Arranging the goods into two piles and paying for each pile separately should not affect the total amount you pay (Associativity; the basic operation for combining prices.)
4. 如果您购买了一件商品,然后退货退款,您的总支出应该为零。(反之亦然。)
4. If you buy an item and then return it for a refund, your total expenditure should be zero. (Inverses.)
这些条件定义了加法运算,这是一个数学定理,因此是适合使用的计算。
It is a mathematical theorem that these conditions define the operation of addition, which is therefore the appropriate computation to use.
整个论证就是我所说的收银机计算理论。它的重要特征是:(1) 它包含关于计算什么和为什么计算的独立论证;(2) 由此产生的操作由它必须满足的约束唯一定义。在视觉过程理论中,基本任务是从世界的图像中可靠地推导出世界属性;分离出既强大到足以定义过程又普遍适用于世界的约束是我们研究的中心主题。
This whole argument is what I call the computational theory of the cash register. Its important features are (1) that it contains separate arguments about what is computed and why and (2) that the resulting operation is defined uniquely by the constraints it has to satisfy. In the theory of visual processes, the underlying task is to reliably derive properties of the world from images of it; the business of isolating constraints that are both powerful enough to allow a process to be defined and generally true of the world is a central theme of our inquiry.
然而,为了使某个过程真正运行,我们必须以某种方式实现它,因此要为该过程操纵的实体选择一个表示。因此,过程分析的第二层涉及选择两件事:(1)过程输入和输出的表示,以及(2)可以实际完成转换的算法。当然,对于加法,输入和输出表示可以相同,因为它们都由数字组成。然而,这通常并不成立。例如,在傅里叶变换的情况下,输入表示可能是时域,而输出可能是频域。如果第一层指定了什么和为什么,那么第二层指定了如何。对于加法,我们可以选择阿拉伯数字作为表示,对于算法,我们可以遵循通常的规则,即先添加最低有效数字,如果和超过 9,则“进位”。收银机,无论是机械的还是电子的,通常都使用这种类型的表示和算法。
In order that a process shall actually run, however, one has to realize it in some way and therefore choose a representation for the entities that the process manipulates. The second level of the analysis of a process, therefore, involves choosing two things: (1) a representation for the input and for the output of the process and (2) an algorithm by which the transformation may actually be accomplished. For addition, of course, the input and output representations can both be the same, because they both consist of numbers. However this is not true in general. In the case of a Fourier transform, for example, the input representation may be the time domain, and the output, the frequency domain. If the first of our levels specifies what and why, this second level specifies how. For addition, we might choose Arabic numerals for the representations, and for the algorithm we could follow the usual rules about adding the least significant digits first and “carrying” if the sum exceeds 9. Cash registers, whether mechanical or electronic, usually use this type of representation and algorithm.
这里有三个要点。首先,通常有多种表示选择。其次,算法的选择通常相当关键地取决于所采用的特定表示。第三,即使对于给定的固定表示,也常常有几种可能的算法来执行相同的过程。选择哪一种通常取决于算法可能具有的任何特别理想或不理想的特性;例如,一种算法可能比另一种算法效率高得多,或者另一种算法效率稍低但更强大(即,对必须运行的数据中的轻微误差不太敏感)。或者,一种算法可能是并行的,而另一种算法可能是串行的。因此,选择可能取决于算法物理上要体现的硬件或机器的类型。
There are three important points here. First, there is usually a wide choice of representation. Second, the choice of algorithm often depends rather critically on the particular representation that is employed. And third, even for a given fixed representation, there are often several possible algorithms for carrying out the same process. Which one is chosen will usually depend on any particularly desirable or undesirable characteristics that the algorithms may have; for example, one algorithm may be much more efficient than another, or another may be slightly less efficient but more robust (that is, less sensitive to slight inaccuracies in the data on which it must run). Or again, one algorithm may be parallel, and another, serial. The choice, then, may depend on the type of hardware or machinery in which the algorithm is to be embodied physically.
这将我们带到了第三层,即实现该过程的物理设备。这里的重要一点是,同样的算法可以用完全不同的技术来实现。一个孩子有条不紊地从右到左加两个数字,必要时进位,他使用的算法可能与附近超市收银机的电线和晶体管实现的算法相同,但算法的物理实现在这两种情况下是完全不同的。另一个例子:许多人编写了玩井字游戏的计算机程序,有一种或多或少不会失败的标准算法。事实上,WD Hillis 和 B. Silverman 已经用一种完全不同的技术在一台由 Tinkertoys(一种儿童木制积木)制成的计算机中实现了这种算法。整个笨重的引擎实际上可以工作,目前存放在圣路易斯密苏里大学的博物馆中。
This brings us to the third level, that of the device in which the process is to be realized physically. The important point here is that, once again, the same algorithm may be implemented in quite different technologies. The child who methodically adds two numbers from right to left, carrying a digit when necessary, may be using the same algorithm that is implemented by the wires and transistors of the cash register in the neighborhood supermarket, but the physical realization of the algorithm is quite different in these two cases. Another example: Many people have written computer programs to play tic-tac-toe, and there is a more or less standard algorithm that cannot lose. This algorithm has in fact been implemented by W. D. Hillis and B. Silverman in a quite different technology, in a computer made out of Tinkertoys, a children’s wooden building set. The whole monstrously ungainly engine, which actually works, currently resides in a museum at the University of Missouri in St. Louis.
某些算法风格比其他算法更适合某些物理基础。例如,在传统数字计算机中,连接数与门数相当,而在大脑中,连接数 (× 10 4 ) 比神经细胞数大得多。根本原因是生物结构中的导线相当便宜,因为它们可以单独生长,并且以三维方式生长。在传统技术中,导线铺设或多或少局限于二维,这严重限制了使用并行技术和算法的范围;相同的操作通常最好以串行方式执行。
Some styles of algorithm will suit some physical substrates better than others. For example, in conventional digital computers, the number of connections is comparable to the number of gates, while in a brain, the number of connections is much larger (× 104) than the number of nerve cells. The underlying reason is that wires are rather cheap in biological architecture, because they can grow individually and in three dimensions. In conventional technology, wire laying is more or less restricted to two dimensions, which quite severely restricts the scope for using parallel techniques and algorithms; the same operations are often better carried out serially.
我们可以按照图 4.1中所示的方式总结我们的讨论,该图说明了在完全理解信息处理设备之前,必须理解它的不同层次。在一个极端,即最高层次,是设备的抽象计算理论,其中设备的性能被描述为从一种信息到另一种信息的映射,这种映射的抽象属性被精确定义,并且它对于当前任务的适当性和充分性得到了证明。在中心是输入和输出表示的选择,以及将一个表示转换为另一个表示所使用的算法。而在另一个极端是算法和表示的物理实现细节——可以说是详细的计算机架构。这三个层次是耦合的,但只是松散的。例如,算法的选择受算法必须执行的操作和必须运行的硬件的影响。但每个层次都有广泛的选择,每个层次的阐述都涉及与其他两个层次相当独立的问题。
We can summarize our discussion in something like the manner shown in Figure 4.1, which illustrates the different levels at which an information-processing device must be understood before one can be said to have understood it completely. At one extreme, the top level, is the abstract computational theory of the device, in which the performance of the device is characterized as a mapping from one kind of information to another, the abstract properties of this mapping are defined precisely, and its appropriateness and adequacy for the task at hand are demonstrated. In the center is the choice of representation for the input and output and the algorithm to be used to transform one into the other. And at the other extreme are the details of how the algorithm and representation are realized physically—the detailed computer architecture, so to speak. These three levels are coupled, but only loosely. The choice of an algorithm is influenced for example, by what it has to do and by the hardware in which it must run. But there is a wide choice available at each level, and the explication of each level involves issues that are rather independent of the other two.
图 4.1
任何机器执行信息处理任务时都必须理解的三个层次。
Figure 4.1
The three levels at which any machine carrying out an information-processing task must be understood.
三个描述层次中的每一个都将在最终理解感知信息处理中占有一席之地,当然它们在逻辑上和因果上是相关的。但需要注意的一点是,由于这三个层次之间的联系相当松散,一些现象只能在其中的一个或两个层次上得到解释。这意味着,例如,对一些心理物理观察的正确解释必须在适当的层次上形成。在试图将心理物理问题与生理学联系起来时,人们常常对应该在哪个层次上解决问题感到困惑。例如,有些问题主要与视觉的物理机制有关——如后像(例如,盯着灯泡后看到的后像)或任何颜色都可以通过三原色的适当混合来匹配(这主要是由于我们人类有三种类型的视锥细胞)。另一方面,内克尔立方体(图 4.2)的模糊性似乎需要不同类型的解释。可以肯定的是,其感知逆转的部分解释必定与大脑内部某处的双稳态神经网络(即具有两个不同稳定状态的神经网络)有关,但很少有人会对没有提到存在两种不同但完全合理的二维图像三维解释的解释感到满意。
Each of the three levels of description will have its place in the eventual understanding of perceptual information processing, and of course they are logically and causally related. But an important point to note is that since the three levels are only rather loosely related, some phenomena may be explained at only one or two of them. This means, for example, that a correct explanation of some psychophysical observation must be formulated at the appropriate level. In attempts to relate psychophysical problems to physiology, too often there is confusion about the level at which problems should be addressed. For instance, some are related mainly to the physical mechanisms of vision—such as afterimages (for example, the one you see after staring at a light bulb) or such as the fact that any color can be matched by a suitable mixture of the three primaries (a consequence principally of the fact that we humans have three types of cones). On the other hand, the ambiguity of the Necker cube (Figure 4.2) seems to demand a different kind of explanation. To be sure, part of the explanation of its perceptual reversal must have to do with a bistable neural network (that is, one with two distinct stable states) somewhere inside the brain, but few would feel satisfied by an account that failed to mention the existence of two different but perfectly plausible three-dimensional interpretations of this two-dimensional image.
图 4.2
所谓的内克尔错觉,以瑞士博物学家 L.A.内克尔于 1832 年发明该错觉而得名。问题的本质是二维表示 (a) 使立方体的深度不复存在,而人类视觉的某个方面就是要恢复这个缺失的第三维度。立方体的深度确实可以被感知,但可能有两种解释,(b) 和 (c)。人的感知通常会从一种转变为另一种。
Figure 4.2
The so-called Necker illusion, named after L. A. Necker, the Swiss naturalist who developed it in 1832. The essence of the matter is that the two-dimensional representation (a) has collapsed the depth out of a cube and that a certain aspect of human vision is to recover this missing third dimension. The depth of the cube can indeed be perceived, but two interpretations are possible, (b) and (c). A person’s perception characteristically flips from one to the other.
对于某些现象,所需的解释类型相当明显。例如,神经解剖学显然主要与第三层级有关,即计算的物理实现。突触机制、动作电位、抑制相互作用等也是如此。神经生理学也主要与这一层级相关……但人们在从神经生理学发现中推断时必须极其谨慎所使用的算法和表示,特别是直到人们清楚地知道需要表示什么信息以及需要实施什么过程。
For some phenomena, the type of explanation required is fairly obvious. Neuroanatomy, for example, is clearly tied principally to the third level, the physical realization of the computation. The same holds for synaptic mechanisms, action potentials, inhibitory interactions, and so forth. Neurophysiology, too, is related mostly to this level…But one has to exercise extreme caution in making inferences from neurophysiological findings about the algorithms and representations being used, particularly until one has a clear idea about what information needs to be represented and what processes need to be implemented.
另一方面,心理物理学与算法和表示的水平更直接相关。不同的算法在性能达到极限或缺乏关键信息时,往往会以截然不同的方式失败。正如我们将看到的,主要是心理物理学证据向 Poggio 和我证明了我们的第一个立体匹配算法(Marr 和 Poggio,1976 年)不是大脑使用的算法,而我们的第二个算法(Marr 和 Poggio,1979 年)大致是大脑使用的算法的最佳证据也来自心理物理学。当然,在这两种情况下,底层计算理论保持不变,只是算法不同。
Psychophysics, on the other hand, is related more directly to the level of algorithm and representation. Different algorithms tend to fail in radically different ways as they are pushed to the limits of their performance or are deprived of critical information. As we shall see, primarily psychophysical evidence proved to Poggio and myself that our first stereo-matching algorithm (Marr and Poggio, 1976) was not the one that is used by the brain, and the best evidence that our second algorithm (Marr and Poggio, 1979) is roughly the one that is used also comes from psychophysics. Of course, the underlying computational theory remained the same in both cases, only the algorithms were different.
心理物理学也有助于确定表征的性质。Roger Shepard (1975)、Eleanor Rosch (1978) 或 Elizabeth Warrington (1975) 的工作在这方面提供了一些有趣的提示。更具体地说,Stevens (1979) 从心理物理实验中论证道,表面方向由倾斜和倾角的坐标表示,而不是(例如)更传统的梯度空间 ( p, q )。他还从受试者在广泛方向上判断表面方向时所犯错误大小的一致性推断出,用于倾斜和倾角的表征量是纯角度,而不是它们的余弦、正弦或正切。
Psychophysics can also help to determine the nature of a representation. The work of Roger Shepard (1975), Eleanor Rosch (1978), or Elizabeth Warrington (1975) provides some interesting hints in this direction. More specifically, Stevens (1979) argued from psychophysical experiments that surface orientation is represented by the coordinates of slant and tilt, rather than (for example) the more traditional (p, q) of gradient space. He also deduced from the uniformity of the size of errors made by subjects judging surface orientation over a wide range of orientations that the representational quantities used for slant and tilt are pure angles and not, for example, their cosines, sines, or tangents.
更一般地讲,如果能清楚地记住不同现象需要在不同层次上解释这一观点,那么它常常有助于评估不时提出的各种反对意见的有效性。例如,人们最喜欢的一种观点是,大脑与计算机完全不同,因为一个是并行的,另一个是串行的。当然,答案是,串行和并行之间的区别是算法层面的区别;它根本不是根本的区别——任何并行编程的东西都可以串行重写(尽管反之亦然)。因此,这种区别并不能成为争论大脑与计算机的运作方式如此不同以至于计算机无法被编程来执行相同任务的理由。
More generally, if the idea that different phenomena need to be explained at different levels is kept clearly in mind, it often helps in the assessment of the validity of the different kinds of objections that are raised from time to time. For example, one favorite is that the brain is quite different from a computer because one is parallel and the other serial. The answer to this, of course, is that the distinction between serial and parallel is a distinction at the level of algorithm; it is not fundamental at all—anything programmed in parallel can be rewritten serially (though not necessarily vice versa). The distinction, therefore, provides no grounds for arguing that the brain operates so differently from a computer that a computer could not be programmed to perform the same tasks.
虽然算法和机制在经验上更容易理解,但从信息处理的角度来看,最顶层,即计算理论的层次才是至关重要的。其原因在于,感知所依赖的计算的性质更多地取决于必须解决的计算问题,而不是实现其解决方案的特定硬件。换句话说,通过了解所解决问题的性质,而不是检查算法所体现的机制(和硬件),更容易理解算法。
Although algorithms and mechanisms are empirically more accessible, it is the top level, the level of computational theory, which is critically important from an information-processing point of view. The reason for this is that the nature of the computations that underlie perception depends more upon the computational problems that have to be solved than upon the particular hardware in which their solutions are implemented. To phrase the matter another way, an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism (and the hardware) in which it is embodied.
同样地,尝试仅通过研究神经元来理解感知就像仅通过研究羽毛来理解鸟类的飞行一样:这根本行不通。为了理解鸟类的飞行,我们必须了解空气动力学;只有这样,羽毛的结构和鸟类翅膀的不同形状才有意义。更重要的是,正如我们将看到的,我们无法仅通过研究它们的解剖学和生理学来理解为什么视网膜神经节细胞和外侧膝状体神经元具有它们所具有的受体场。我们可以通过研究它们的连接和相互作用来了解这些细胞和神经元的行为方式,但为了理解受体场为什么是这样的——为什么它们是圆对称的,为什么它们的兴奋区和抑制区具有特征性的形状和分布——我们必须了解一些微分算子理论、带通信道和不确定性原理的数学知识。
In a similar vein, trying to understand perception by studying only neurons is like trying to understand bird flight by studying only feathers: It just cannot be done. In order to understand bird flight, we have to understand aerodynamics; only then do the structure of feathers and the different shapes of birds’ wings make sense. More to the point, as we shall see, we cannot understand why retinal ganglion cells and lateral geniculate neurons have the receptive fields they do just by studying their anatomy and physiology. We can understand how these cells and neurons behave as they do by studying their wiring and interactions, but in order to understand why the receptive fields are as they are—why they are circularly symmetrical and why their excitatory and inhibitory regions have characteristic shapes and distributions—we have to know a little of the theory of differential operators, band-pass channels, and the mathematics of the uncertainty principle.
也许并不奇怪,神经科学等非常专业的经验学科未能充分认识到计算理论的缺失;但令人惊讶的是,这种方法在人工智能的早期发展中没有发挥更有力的作用。长期以来,执行某项任务的启发式程序被认为是该任务的理论,而程序做什么和它如何做之间的区别并没有得到认真对待。结果,(1) 发展出了一种解释风格,它要求使用特殊机制来解决特定问题,(2) 特定的数据结构,例如 LISP 编程语言中称为属性列表的属性值对列表,被认为相当于知识表示的理论,(3) 除了运行程序之外,通常没有其他方法可以确定程序是否会处理特定情况。
Perhaps it is not surprising that the very specialized empirical disciplines of the neurosciences failed to appreciate fully the absence of computational theory; but it is surprising that this level of approach did not play a more forceful role in the early development of artificial intelligence. For far too long, a heuristic program for carrying out some task was held to be a theory of that task, and the distinction between what a program did and how it did it was not taken seriously. As a result, (1) a style of explanation evolved that invoked the use of special mechanisms to solve particular problems, (2) particular data structures, such as the lists of attribute value pairs called property lists in the LISP programing language, were held to amount to theories of the representation of knowledge, and (3) there was frequently no way to determine whether a program would deal with a particular case other than by running the program.
未能认识到这种在“是什么”和“如何”之间的理论上的区别,也极大地阻碍了人工智能和语言学领域之间的交流。乔姆斯基 (1965) 的转换语法理论是前面定义的真正的计算理论。它只关心指定英语句子的句法分解应该是什么,而根本不关心应该如何实现这种分解。乔姆斯基本人对这一点非常清楚——这大致是他对能力和表现的区分,尽管他对表现的理解确实包括其他因素,比如在说话中途停止——但他的理论是由看起来像计算的转换来定义的这一事实似乎让很多人感到困惑。例如,Winograd (1972b) 认为可以批评乔姆斯基的理论,理由是它不能被反转,因此不能在计算机上运行;我听到过乔姆斯基的语言学同事们对同一论点的思考,他们把注意力转向如何根据真实的英语句子计算出语法结构。
Failure to recognize this theoretical distinction between what and how also greatly hampered communication between the fields of artificial intelligence and linguistics. Chomsky’s (1965) theory of transformational grammar is a true computational theory in the sense defined earlier. It is concerned solely with specifying what the syntactic decomposition of an English sentence should be, and not at all with how that decomposition should be achieved. Chomsky himself was very clear about this—it is roughly his distinction between competence and performance, though his idea of performance did include other factors, like stopping in midutterance—but the fact that his theory was defined by transformations, which look like computations, seems to have confused many people. Winograd (1972b), for example, felt able to criticize Chomsky’s theory on the grounds that it cannot be inverted and so cannot be made to run on a computer; I had heard reflections of the same argument made by Chomsky’s colleagues in linguistics as they turn their attention to how grammatical structure might actually be computed from a real English sentence.
解释很简单,寻找可以实施乔姆斯基理论的算法与制定理论本身是完全不同的任务。用我们的话来说,这是一项不同层次的研究,两项任务都必须完成。马库斯(1980)赞赏了这一点,他关心的正是乔姆斯基理论如何实现,以及人类语法处理器能力的哪些限制可能导致乔姆斯基发现的句法结构限制。甚至出现新兴的语法“踪迹”理论(Chomsky and Lasnik, 1977)可能提供一种综合这两种方法的方法——例如,表明计算理论中的一些相当临时的限制可能是由于实现句法解码的计算能力薄弱造成的。
The explanation is simply that finding algorithms by which Chomsky’s theory may be implemented is a completely different endeavor from formulating the theory itself. In our terms, it is a study at a different level, and both tasks have to be done. This point was appreciated by Marcus (1980), who was concerned precisely with how Chomsky’s theory can be realized and with the kinds of constraints on the power of the human grammatical processor that might give rise to the structural constraints in syntax that Chomsky found. It even appears that the emerging “trace” theory of grammar (Chomsky and Lasnik, 1977) may provide a way of synthesizing the two approaches—showing that, for example, some of the rather ad hoc restrictions that form part of the computational theory may be consequences of weaknesses in the computational power that is available for implementing syntactical decoding.
我们对这种新的视觉计算方法的调查现已完成。尽管该说明中存在许多空白,但我希望它足够可靠,能够建立关于该主题的坚定观点,并促使读者开始判断其价值。在这一简短的章节中,我将对整个方法进行非常广泛的介绍,探究其最重要的一般特征及其相互关系,并尝试谈谈这种方法所暗示的研究风格。将讨论分为四个要点很方便。
Our survey of this new, computational approach to vision is now complete. Although there are many gaps in the account, I hope that it is solid enough to establish a firm point of view about the subject and to prompt the reader to begin to judge its value. In this brief chapter, I shall take a very broad view of the whole approach, inquiring into its most important general features and how they relate to one another, and trying to say something about the style of research that this approach implies. It is convenient to divide the discussion into four main points.
第一点是我们在整个叙述中都遇到过的——不同层次解释的概念。这种方法的核心原则是,要理解视觉是什么以及它是如何工作的,只在一个层次上的理解是不够的。仅仅能够描述单个细胞的反应是不够的,仅仅能够局部预测心理物理实验的结果也是不够的。甚至能够编写出大致按照预期方式运行的计算机程序也是不够的。人们必须同时做所有这些事情,同时也要非常清楚我称之为计算理论层次的额外解释层次。认识到这个层次的存在和重要性是这种方法最重要的方面之一。认识到这一点后,人们可以明确地制定三个层次的解释(计算理论、算法和实现),然后就会清楚地知道这些不同的层次与可以进行的不同类型的经验观察和理论分析之间的关系。我特别强调计算理论层面,并不是因为我认为它本身比其他两个层面更重要——这种方法的真正威力在于整合所有三个攻击层面——而是因为这是一个以前从未被认识和付诸行动的解释层面。因此,它可能是该领域新手最难理解的概念之一,仅凭这个原因,它的重要性在任何一本入门书中都不应被低估,本书就是如此。
The first point is one that we have met throughout the account—the notion of different levels of explanation. The central tenet of the approach is that to understand what vision is and how it works, an understanding at only one level is insufficient. It is not enough to be able to describe the responses of single cells, nor is it enough to be able to predict locally the results of psychophysical experiments. Nor it is enough even to be able to write computer programs that perform approximately in the desired way. One has to do all these things at once and also be very aware of the additional level of explanation that I have called the level of computational theory. The recognition of the existence and importance of this level is one of the most important aspects of this approach. Having recognized this, one can formulate the three levels of explanation explicitly (computational theory, algorithm, and implementation), and it then becomes clear how these different levels are related to the different types of empirical observation and theoretical analysis that can be conducted. I have laid particular stress on the level of computational theory not because I regard it as inherently more important than the other two levels—the real power of the approach lies in the integration of all three levels of attack—but because it is a level of explanation that has not previously been recognized and acted upon. It is therefore probably one of the most difficult ideas for newcomers to the field to grasp, and for this reason alone its importance should not be understated in any introductory book, such as this is intended to be.
第二个要点是,从信息处理的角度看,我们能够为视觉过程制定一个相当清晰的整体框架。这个框架基于这样的观点:视觉中的关键问题围绕着所用表征的性质——即在视觉过程中明确表达的世界的特定特征——以及恢复这些特征、创建和维护表征并最终读取表征的过程的性质。通过分析空间通过对视觉问题各个方面进行分析,我们得出了视觉信息处理的总体框架,该框架基于三个主要表示:(1)原始草图,涉及二维图像的明确属性,从强度变化的数量和配置到局部图像几何的原始表示,更复杂的是,包括底层反射分布中存在的任何高阶结构的分层描述;(2)-D草图,以观察者为中心,表示可见表面的深度和方向,包括这些量的不连续轮廓;(3)3-D模型表示,其重要特征是它的坐标系以物体为中心,包括体积基元(明确表示物体所占据的空间的组织,而不仅仅是其可见表面),包括各种大小的基元,以模块化、分层的方式组织起来。
The second main point is that by taking an information-processing point of view, we have been able to formulate a rather clear overall framework for the process of vision. This framework is based on the idea that the critical issues in vision revolve around the nature of the representations used—that is, the particular characteristics of the world that are made explicit during vision—and the nature of the processes that recover these characteristics, create and maintain the representations, and eventually read them. By analyzing the spatial aspects of the problem of vision, we arrived at an overall framework for visual information processing that hinges on three principal representations: (1) the primal sketch, which is concerned with making explicit properties of the two-dimensional image, ranging from the amount and disposition of the intensity changes there to primitive representations of the local image geometry and including at the more sophisticated end a hierarchical description of any higher-order structure present in the underlying reflectance distributions; (2) the -D sketch, which is a viewer-centered representation of the depth and orientation of the visible surfaces and includes contours of discontinuities in these quantities; and (3) the 3-D model representation, whose important features are that its coordinate system is object centered, that it includes volumetric primitives (which make explicit the organization of the space occupied by an object and not just its visible surfaces), and that primitives of various size are included, arranged in a modular, hierarchical organization.
第三个要点涉及从场景图像中恢复场景物理特征各个方面的过程的研究。制定此类过程的计算理论的关键是发现对世界行为方式的有效约束,这些约束可提供足够的附加信息以恢复所需的特征。这种分析的力量在于,发现有效、足够普遍的约束可以得出与视觉相关的结论,这些结论与其他科学分支的结论一样具有持久性。
The third main point concerns the study of processes for recovering the various aspects of the physical characteristics of a scene from images of it. The critical act in formulating computational theories for such processes is the discovery of valid constraints on the way the world behaves that provide sufficient additional information to allow recovery of the desired characteristic. The power of this type of analysis resides in the fact that the discovery of valid, sufficiently universal constraints leads to conclusions about vision that have the same permanence as conclusions in other branches of science.
此外,一旦制定了某个过程的计算理论,就可以设计出实现该过程的算法,并将其性能与人类视觉处理器的性能进行比较。这允许两种结果。首先,如果性能基本相同,我们有充分的证据表明底层计算理论的约束是有效的,并且可能隐含在人类处理器中;其次,如果某个过程与人类的表现相匹配,那么它可能足够强大,可以成为通用视觉机器的一部分。
Furthermore, once a computational theory for a process has been formulated, algorithms for implementing it may be designed, and their performance compared with that of the human visual processor. This allows two kinds of results. First, if performance is essentially identical, we have good evidence that the constraints of the underlying computational theory are valid and may be implicit in the human processor; second, if a process matches human performance, it is probably sufficiently powerful to form part of a general purpose vision machine.
最后一点涉及这类方法的方法论或风格,它涉及两个主要观察结果。首先,图 4.3中明确列出的表征和过程之间的二元性通常有助于思考在研究特定问题时如何最好地进行。在表征和过程的研究中,日常经验或相当普遍的心理物理学甚至神经生理学发现往往会引发一般问题。这种一般性观察结果通常可以导致特定过程或表征理论的形成,可以对其具体示例进行编程或对其进行详细的心理物理测试。一旦我们对这一层面的过程或表征的正确性有足够的信心,我们就可以探究其详细的实现,这涉及到神经生理学和神经解剖学的终极和非常困难的问题。
The final point concerns the methodology or style of this type of approach, and it involves two main observations. First, the duality between representations and processes, which is set out explicitly in Figure 4.3, often provides a useful aid to thinking how best to proceed when studying a particular problem. In the study both of representations and of processes, general problems are often suggested by everyday experience or by psychophysical or even neurophysiological findings of a quite general nature. Such general observations can often lead to the formulation of a particular process or representational theory, specific examples of which can be programmed or subjected to detailed psychophysical testing. Once we have sufficient confidence in the correctness of the process or representation at this level, we can inquire about its detailed implementation, which involves the ultimate and very difficult problems of neurophysiology and neuroanatomy.
图 4.3
表示和过程之间的关系。
Figure 4.3
Relationships between representations and processes.
第二个观察结果是,这类研究没有真正的秘诀——尽管我有时暗示有——就像在任何其他科学分支中,没有发现事物的直接程序一样。事实上,乐趣的一部分在于我们永远不知道下一个关键点会从何而来——日常经验、神经缺陷报告、三维几何定理、超敏反应中的心理物理发现、神经生理观察或表征问题的仔细分析。所有这些类型的信息在建立我所描述的框架方面都发挥了重要作用,它们可能将继续以一种有趣且不可预测的方式为其进步做出贡献。我只希望这些观察可以说服我的一些读者加入我们的冒险之旅,并帮助完成解开人类视觉感知之谜这一漫长而有意义的任务。
The second observation is that there is no real recipe for this type of research—even though I have sometimes suggested that there is—any more than there is a straightforward procedure for discovering things in any other branch of science. Indeed, part of the fun is that we never really know where the next key is going to come from—a piece of daily experience, the report of a neurological deficit, a theorem about three-dimensional geometry, a psychophysical finding in hyperacuity, a neurophysiological observation, or the careful analysis of a representational problem. All these kinds of information have played important roles in establishing the framework that I have described, and they will presumably continue to contribute to its advancement in an interesting and unpredictable way. I hope only that these observations may persuade some of my readers to join in the adventures we have had and to help in the long but rewarding task of unraveling the mysteries of human visual perception.
科里·J·马利
Corey J. Maley
2023
2023
让我们从认知科学的两种正统观念开始,这两种观念都可以用口号的形式方便地表达出来。克里斯托弗·科赫(Christoph Koch)(1999)在计算神经科学领域开创性的工作“大脑计算!”杰里·福多尔(Jerry Fodor)(1981)曾说过“没有表征就没有计算”。当然,这两种观点都有反对者,但认知科学家通常认为认知和计算之间、计算和表征之间存在密切联系。总之,这些思想构成了认知科学的基本假设:心理需要计算,计算需要表征。因此,理解表征,进而理解计算,是认知科学成功的必要条件。
Let us begin with two orthodoxies of cognitive science, both conveniently available in slogan form. Christoph Koch (1999) begins his seminal work in computational neuroscience “The brain computes!” Jerry Fodor (1981) famously proclaimed “There is no computation without representation.” There are detractors from each of these views, of course, but cognitive scientists generally assume a close connection between cognition and computation, and between computation and representation. Taken together, these ideas form the foundational assumption of cognitive science: mentality requires computation, which requires representation. Understanding representation, and in turn, computation, is thus the sine qua non of cognitive science’s success.
尽管我碰巧同意这些正统观点,但我接下来的目的并不是为它们辩护。相反,我想表明表征类型的范围——尤其是那些可以支持计算的类型——比几乎所有认知科学家所认识到的要大得多。特别是,这个领域的模拟方面尚未得到充分探索和理论化。
Although I happen to agree with these orthodoxies, my aim in what follows is not to make the case for either. Instead, I want to show that the landscape of representational types—particularly those that can underwrite computation—is larger than what nearly all cognitive scientists have recognized. In particular, the analog side of this space has been under-explored and under-theorized.
问题的大部分在于,从概念上讲,计算几乎普遍被认为是数字计算的同义词:即以数字表示形式进行的计算。可能存在其他合法类型的计算,这一点在很大程度上没有被注意到,也没有受到质疑。此外,就“数字”与任何事物(无论是表示、计算还是其他完全不同的东西)的对比而言,它与“模拟”的对比是缺乏启发性的,最终是误导性的。普遍的观点认为,“数字”与“离散”同义,而“模拟”与“连续”同义。这里面有一定道理,但不幸的是,这个小小道理掩盖了模拟和数字表示之间更根本的区别,反过来,也掩盖了计算的区别。
Much of the problem is that, conceptually, computation is almost universally taken to be synonymous with specifically digital computation: the kind of computation that traffics in digital representation. That there could be some other legitimate kind of computation has gone largely unnoticed and unquestioned. Furthermore, insofar as “digital” is contrasted with anything (whether representation, computation, or something else entirely), it is contrasted with “analog” in an unilluminating and ultimately misleading way. The received view has it that “digital” is synonymous with “discrete,” and that “analog” is synonymous with “continuous.” There is a kernel of truth here, but unfortunately, that small kernel has obscured the more fundamental difference between analog and digital representation, and in turn, computation.
在本章中,我将尝试开始纠正这种情况。具体来说,我将展示如何正确理解模拟表示,并以一种有助于我们正确理解模拟计算的方式来做到这一点。为此,我将首先提供一些背景信息,说明公认观点的形成。本文将以对二十世纪计算机的简要讨论的形式展开。接下来,我将介绍一种表示类型学,展示模拟和数字表示(正确构建)在更广泛的表示类型之间更基本的区别中所处的位置。因此,虽然模拟计算机确实操纵模拟表示,而数字计算机操纵数字表示,但理解“模拟”和“数字”的确切含义(公认观点忽略了这一点)将有助于更清楚地讨论模拟和数字计算的含义。最后,我将提出如何正确理解模拟计算,使计算更适合 4E 传统中的研究人员——尤其是那些在具身认知和行为认知方面——他们原本避开计算。
In this chapter, I will try to begin remedying this situation. Specifically, I will show how to correctly understand analog representation, and do so in a way that helps us correctly understand analog computation. To do this, I will first provide some context for how the received view came to be. This will take the form of a brief discussion of computing machines in the twentieth century. Next, I will present a typology of representations, showing where analog and digital representation (rightly construed) fit within the broader landscape of more fundamental distinctions between representational types. Thus, while it is true that analog computers manipulate analog representations and digital computers manipulate digital ones, understanding what, precisely, “analog” and “digital” are about (which is missed by the received view) will then enable a clearer discussion of what analog and digital computation is about. Finally, I will suggest how a correct understanding of analog computation might make computation more palatable to researchers in the 4E tradition—particularly those in embodied and enactive cognition—who have otherwise eschewed computation.
由于本章的大部分内容都致力于纠正我认为目前对计算类型和表征类型之间基本划分理解上的错误,因此花点时间看看我们是如何走到这一步的会很有帮助。然而有趣的是,关于计算历史的著作并不像人们想象的那么多。Mahoney (2011) 指出,技术史学家对数字计算机历史的著述相对较少。1著述的作者主要是计算机科学家和数学家,这给计算历史学家和哲学家带来了问题:
Because much of this chapter is devoted to correcting what I take to be an error in how fundamental divisions between computational and representational types are currently understood, it is helpful to take a moment to see how we got here. Interestingly, however, there is not as much written on the history of computation as one might expect. Mahoney (2011) notes that historians of technology have written relatively little on the history of digital computing machines.1 The authors of what has been written are largely computer scientists and mathematicians, which presents problems to the historian and philosopher of computation:
虽然它是第一手资料,是专家撰写的,但它也受到当前知识状态的指导和专业文化的约束。也就是说,作者将更具批判性的外部观察者可能认为是选择的东西视为既定事实(通常是技术既定事实)。阅读他们的叙述很难看到替代方案,因为作者自己已经忘记了他们不知道现在所知道的东西的时代。(Mahoney,2011,22-23)
While it is firsthand and expert, it is also guided by the current state of knowledge and bound by the professional culture. That is, its authors take as givens (often technical givens) what a more critical, outside viewer might see as choices. Reading their accounts makes it difficult to see the alternatives, as the authors themselves lose touch with a time when they did not know what they know now. (Mahoney, 2011, 22–23)
这个问题在涉及模拟计算机时尤其严重,因为关于模拟计算机的描述甚至更少(Mindell,2002 年)。Nyce(1996 年,3)很好地阐述了这一点:“由于数字计算机和计算非常成功,它们影响了我们如何看待计算机作为机器以及计算作为一个过程——以至于今天很难重建模拟计算的全部内容。”完整的历史将不得不等待另一个时代(和它自己的书);现在,我们可以勾勒出这段历史,足以让我们感受到未来会发生什么。
This problem is particularly acute when it comes to analog computing machines, where even less has been written (Mindell, 2002). Nyce (1996, 3) puts the point well: “Because digital computers and computation have been so successful, they have influenced how we think about both computers as machines and computation as a process—so much so, it is difficult today to reconstruct what analog computing was all about.” A complete history will have to wait for another time (and its own book); for now, we can sketch that history just enough to get a feel for what is to come.
在十九世纪末和二十世纪初,各种各样的计算机器被用于科学、工程和工业领域。其中包括加法机、各种类型的计算尺和机械模拟计算机。一般来说,这种大杂烩机制没有经过任何系统的理论研究:不同的问题需要不同的解决方案,大多数机器都是为解决特定类型的任务而制造的。例如,机械加法器的工作原理与收银机非常相似,允许人类计算机2可以执行基本的算术计算。制表机使操作员能够分析人口普查数据。在海军战舰上,机械火控计算机决定如何定位舰炮以准确击中移动目标(Mindell,2002 年)。
In the late nineteenth and early twentieth century, a wide variety of computing machines were used in scientific, engineering, and industrial contexts. These included adding machines, slide rules of various types, and mechanical analog computers. In general, this hodgepodge of mechanisms was not subject to any systematic theoretical study: different problems required different solutions, and most machines were built to solve a particular kind of task. For example, mechanical adders worked much like cash registers, allowing human computers2 to perform basic arithmetical calculations. Tabulating machines allowed operators to analyze census data. On naval warships, mechanical fire-control computing machines determined how to position the ship’s guns to accurately hit moving targets (Mindell, 2002).
缺乏系统分析的一个例外是微分分析仪的研究,它是麻省理工学院的 Vannevar Bush 开发的一种机械模拟计算机。与许多其他为单一特殊目的而设计的计算机不同,微分分析仪可以重新配置以计算大量不同类型数学问题的解 (Bush, 1931)。这些机械模拟计算机主要(但不完全)用于解决涉及微分方程的问题。许多工程和科学问题都可以用微分方程组来分析,但这些系统中的大部分没有解析解。因此,它们只能通过繁琐的数值计算或某种机械模拟来“解决”。事实证明,微分分析仪是研究此类问题的一种有用的新方法。
One exception to this lack of systematic analysis was the work done on the differential analyzer, a kind of mechanical analog computer developed by Vannevar Bush at MIT. Unlike many other computing machines that were created for a single, special purpose, the differential analyzer could be reconfigured to compute solutions to a large number of different kinds of mathematical problems (Bush, 1931). These mechanical analog computers were mostly (but not exclusively) used for solving problems that involved differential equations. Many engineering and scientific problems are amenable to analysis in terms of systems of differential equations, but large classes of these systems have no analytic solution. Thus, they can only be “solved” by tedious numerical calculation, or by some kind of mechanical simulation. The differential analyzer proved a useful new way to study problems of this sort.
纯机械模拟计算机最终让位于机电和电子计算机。机械模拟计算机中的变量将通过轴的左右位移、齿轮的角度或滚筒或滚筒的总旋转次数等量来表示:换句话说,要解决的问题量由计算机中的物理量表示,而这些量的变化则通过物理移动这些部件来表示。机电和电子计算机使用电量(例如电压和电阻)来表示变量的数量。并且,与机械模拟计算机一样,变量数量的变化由这些电量的变化来表示。由于全电子模拟计算机没有移动部件,因此它们比机械和机电计算机更容易编程和维护。这些机器最终完全取代了机械和机电计算机,并在 20 世纪 70 年代成为模拟计算机的标准类型。
Purely mechanical analog computers eventually gave way to electromechanical and electronic computers. Variables in mechanical analog computers would be represented by quantities such as the left-right displacement of a shaft, the angle of a gear, or the running total number of rotations of a roller or drum: in other words, quantities of the problem to be solved were represented by physical quantities in the computing machine, and variation in those quantities was represented by physically moving those parts. Electromechanical and electronic computers used electrical quantities, such as voltage and resistance, to represent the quantity of a variable. And, like the mechanical analog computer, changes in the quantity of a variable were represented by a change in those electrical quantities. Because fully electronic analog computing machines had no moving parts, they were more easily programmed and maintained than their mechanical and electromechanical counterparts. These machines eventually replaced the mechanical and electromechanical entirely, becoming the standard type of analog computer by the 1970s.
这些模拟计算机解决的大多数问题本质上都是连续的,因此计算机用来表示变量的物理量本质上也是连续的(例如,角度、旋转和电压都自然被视为连续量)。但是,也存在重要的例外:一些研究的问题需要绝对值函数或阶跃函数等。在这些情况下,模拟计算机将使用不连续变量来表示这些不连续性(James 等人,1971 年;Maley,即将出版)。
The majority of the problems solved on these analog computing machines were continuous in nature, so the physical quantities that the machines used to represent variables were likewise continuous in nature (e.g., angles, rotations, and voltages are all naturally considered as continuous quantities). However, there are important exceptions: some of the problems studied would require, for example, the absolute value function, or step functions. In those cases, the analog computing machines would use discontinuous variables to represent those discontinuities (James et al., 1971; Maley, forthcoming).
尽管数字计算机现在已经完全取代了模拟计算机,但这一过程花了几十年的时间。最早的数字计算机比模拟计算机更大、更慢、更容易出错、更昂贵。但一旦它们处于同等地位,就必须向潜在买家和消费者区分这两种类型。用户。在那个历史时期,模拟计算机和数字计算机都是电子计算机。但是,模拟计算机(大多数)使用连续电压来表示变量,而数字计算机则完全使用离散电压。偶尔会出现模拟计算机使用不连续(即离散)电压的例外,但这种情况被忽略了。对于数字计算机来说,没有例外:它们总是使用离散电压。这是“模拟/数字”区别的一个重要来源:模拟机器是连续的(或者,至少,通常如此),而数字机器是离散的。
Although digital computers have now completely replaced analog computers, it took a few decades for them to do so. The earliest digital computers were larger, slower, more error-prone, and more expensive than their analog counterparts. But once they were on equal footing, it became important to distinguish the two types to potential buyers and users. At that point in history, both the analog and digital computing machines were electronic. However, the analog ones (mostly) used continuous voltages to represent their variables, while the digital ones used exclusively discrete voltages. The occasional exception where analog computers used discontinuous (i.e., discrete) voltages was overlooked. For digital computers, there were no exceptions: they always used discrete voltages. This was one important source of the “analog/digital” distinction: analog machines were continuous (or, at least, very often so), while digital machines were discrete.
关于这种划分的历史当然还有很多话要说,但这只是我们如何得出模拟/数字区别的公认观点的一部分。同样,根据这种观点,“模拟”与“连续”同义,因此模拟表示连续变化,模拟计算机只是使用连续表示的计算机。另一方面,“数字”与“离散”同义,因此数字计算机只是使用离散表示的计算机。
There is certainly more to say about the history of this division, but this is part of the story of how we arrived at the received view of the analog/digital distinction. Again, according to this view, “analog” is synonymous with “continuous,” and thus analog representations vary continuously, and analog computers are just computers that use continuous representations. On the other side, “digital” is synonymous with “discrete,” and thus digital computers are just computers that use discrete representations.
现在,即使电子模拟计算机确实总是使用连续电压(根据定义,传统观点就是如此),模拟计算机和数字计算机之间最重要的区别不仅仅是连续性与离散性的问题。更重要的是——我将在下一节中详细说明——电压如何表示。3简而言之,模拟表示(以及模拟计算机)通过物理量的大小来表示数字的大小;数字表示(以及数字计算机)通过数字来表示数字的名称,然后数字由物理量的变化来表示。下面将对此进行更详细的讨论,但现在,我只想指出,这些计算机及其表示如何运作的这一关键区别被后来成为传统模拟/数字区别的东西所掩盖。传统观点掩盖了一个事实,即模拟和数字之间的差异比花生酱类型的差异更有趣,一种是块状的,另一种是光滑的。相反,有两种根本不同的表示方式,其中一种本质上利用了物理表示的物理性质,另一种则从物理表示中抽象出来。
Now, even if it were true that electronic analog computers always used continuous voltages (as the received view would have it, by definition), the most important difference between analog and digital computers is not simply a matter of continuity versus discreteness. More important—as I will spell out in detail in the next section—is how the voltages do the representing.3 In short, analog representations (and, in turn, analog computers) represent the magnitudes of numbers via the magnitudes of physical quantities; digital representations (and, in turn, digital computers) represent the names of numbers via digits, where the digits are then represented by variations in physical quantities. Much more will be said about this below, but for now, I simply note that this crucial difference in how these computing machines and their representations operate was glossed over by what became the received analog/digital distinction. The received view has obscured the fact that the difference between the analog and digital is more interesting than the difference between types of peanut butter, where one is chunky and the other is smooth. Rather, there are two fundamentally different ways of representing, one of which essentially takes advantage of the physical nature of physical representations, and another that abstracts away from them.
根据我在此提供的框架,表征类型之间存在两个基本区别。首先,有模拟的,也有非模拟的。其次,有连续的,也有非连续的(即离散的)。4 根据这种观点(与传统观点相反),“模拟”并不等同于“连续”,这一观点首先由 Lewis(1971 年)明确提出,随后由 Maley(2011 年)进行辩护和扩展(这一观点也隐含在 Copeland,1997 年的著作中)。最初拒绝这种同义词的理论原因已在现存(尽管是历史性的)模拟计算机的例子中得到证实(Maley,即将出版)。此外,“数字”并不等同于“离散”。相反,“数字”只是众多非模拟数字表示方式中的一种(尽管它的复杂性被它的熟悉性所掩盖)。在当代数字计算机中,正是这种特殊的表示类型(而不仅仅是离散性)对于理解其作为计算机的运作至关重要。
According to the framework I offer here, there are two fundamental distinctions to be made between representational types. First, there are those that are analog and those that are not. Second, there are representations that are continuous and those that are not (i.e., discrete).4 On this view (contra the received view), “analog” is not synonymous with “continuous,” a point first made explicit by Lewis (1971), then defended and extended by Maley (2011) (this point is also implicit in Copeland, 1997). What were originally theoretical reasons for rejecting this synonymy have been borne out in examples of extant (albeit historical) analog computers (Maley, forthcoming). Moreover, “digital” is not synonymous with “discrete.” Instead, “digital” turns out to be just one of many non-analog ways of representing numbers (although its complexity is obscured by its familiarity). In contemporary digital computers, it is this particular type of representation—and not mere discreteness—that is essential to understanding their operation qua computer.
现在,应该说有些哲学家已经为公认观点辩护。例如,Goodman (1968) 为公认观点的一个复杂版本辩护。Haugeland (1981) 也为公认观点辩护,将近似程序的工程考虑带入了讨论中。后来对公认观点或其变体的辩护可以在 (Papayannopoulos, 2020; Katz, 2016; Schonbein, 2014) 中找到。不过,现在我将把对这些观点的批评放在一边。
Now, it should be said that some philosophers have defended the received view. Goodman (1968), for instance, defends a sophisticated version of the received view. Haugeland (1981) also defends the received view, bringing engineering considerations of approximation procedures into the conversation. Later defenses of the received view, or variations thereof, can be found in (Papayannopoulos, 2020; Katz, 2016; Schonbein, 2014). For now, however, I will set critiques of these views aside.
在我们开始认真讨论之前,需要先了解一些术语。在下文中,我将把执行表示的物理事物简称为表示,将被表示的事物称为被表示物。这种区别与载体/内容区别的接近程度取决于如何理解“载体”和“内容”,这在文献中有所不同。有些人认为载体是物理的,而另一些人则不这么认为;有些人认为内容只是心理内容,而另一些人则不这么认为。为了避免可能的混淆,我将使用这些新术语。
A bit of terminology is needed before we begin in earnest. In what follows, I will refer to the physical thing doing the representing simply as the representation, and I will refer to what it is that is being represented as the representandum. How closely this distinction aligns with the vehicle/content distinction depends on how “vehicle” and “content” are understood, which varies in the literature. Some understand vehicles to be physical, and others do not; some understand content to be only mental content, and others do not. In order to avoid possible confusion, I will use these new terms.
上面做出的第一个区别——模拟表示和非模拟表示——反映了表示数字的两种不同方式。假设数学柏拉图主义是正确的——数字是真实的、抽象的对象。为了操纵这些抽象对象(就像我们习惯做的那样,无论是手工还是用计算机),我们必须以某种方式具体地表示它们。一种方法是通过它们的大小来表示数字:数字一由某个具体的物理量一表示;数字二由具体的物理量二表示;等等。因此,数字二可以用一根杆的长度来表示,其中杆长两米;或者可以用电路元件两端的电位来表示,其中电位为两伏。这是模拟的方式。
The first distinction made above—that between analog and non-analog representations—reflect two distinct ways of representing numbers. Suppose for a moment that mathematical Platonism is true—that numbers are real, abstract objects. In order to manipulate these abstract objects (as we are wont to do, both by hand and with computing machines), we must concretely represent them somehow or another. One way to do that is to represent numbers via their magnitudes: the number one is represented by some concrete, physical magnitude of one; the number two is represented by a concrete, physical magnitude of two; and so on. Thus, the number two could be represented by the length of a rod, where the rod is two meters long; or it could be represented by the electrical potential across a circuit element, where the potential is two volts. This is the way of the analog.
第二种(完全不同的)表示数字的方式是用我们所谓的名称来表示数字,名称有多种类型。有些名称是完全任意的。例如,“Drumthwacket”和“Graceland”是美国特定建筑物的任意名称;同样,“ e ”和“ π ”也是特定数字的任意名称。名称本身几乎没有提供有关这些名称所指对象的信息。或者,有些名称具有某种结构。如果我们将地址视为名称(有时确实如此),那么“123 Main St.”就是根据某些约定给出某个城市中特定房屋位置的名称。同样,“314”是根据某些约定给出某个进制数的值的名称。这是非模拟的方式。
The second—completely distinct—way to represent numbers is to represent them by what we might call their names, where names are of different types. Some names are completely arbitrary. For example, “Drumthwacket” and “Graceland” are arbitrary names for particular buildings in the United States; similarly, “e” and “π” are arbitrary names for particular numbers. Almost no information about the referent of those names is given by the names themselves. Alternatively, some names have a type of structure. If we consider addresses to be names (as they sometimes are), then “123 Main St.” is a name that gives the location of a particular house in some city or another, according to certain conventions. Similarly, “314” is a name that gives the value of a number in some base, according to certain conventions. This is the way of the non-analog.
简而言之,它们之间的重要区别在于,我们可以用数字的大小或名称来表示数字;这就是模拟表示和非模拟表示之间的区别(图 5.1中的左右划分)。这种差异反映在它们的物理表示方式上。就其物理实例而言,模拟表示和非模拟表示可以描述为一阶表示和二阶表示之间的差异(Maley,2021)。这种理解两者的方式有助于明确使用这些表示的计算类型之间的差异,下一节将对此进行讨论。
In short, the important difference between these is that we can represent a number either by its magnitude or by its name; this is the difference between analog and non-analog representation (the left-right division in figure 5.1). This difference is then reflected in how these are physically represented. With respect to their physical instantiations, analog and non-analog representation can be characterized as a difference between first-order and second-order representation (Maley, 2021). This way of understanding the two helps to make clear the differences between the types of computation that use those representations, discussed in the next section.
图 5.1
表示类型。模拟表示可以是连续的或离散的,非模拟表示则是离散的。非模拟表示进一步分为符号表示和数字表示,后者又可以进一步细分(见图5.3)。
Figure 5.1
Representational types. Analog, which can be continuous or discrete, and non-analog, which are only discrete. Non-analog representations are further divided into symbolic and numerical representations, which can be divided yet further (see figure 5.3).
让我们更详细地了解每种表示方式,从模拟表示开始。我将论证模拟(或非模拟)是根本,然后我就可以说明为什么非模拟与数字不同。
Let us look at each of these ways of representing in more detail, starting with analog representation. I will argue that being analog (or not) is fundamental, which will then enable me to show why being non-analog is not the same thing as being digital.
因为我们在这里关注的是某种物理计算机(无论是自然的还是人造的),所以表征首先是某种物理对象。表征通过其物理的一个(或多个)属性;我认为,拼写“via”的不同方式正是区分表征类型的最根本方法。
Because we are concerned here with physical computers of some kind or another (either natural or artificial), representations are first and foremost physical objects of some kind or another. Representations do their representing via one (or more) of their physical properties; the different ways to spell out “via” is precisely what I argue is most fundamental about distinguishing representational types.
如上所述,表示表示数字的一种方式(模拟方式)是其属性之一的数量或量与被表示项共变。例如,可以使用一块粘土来表示数字,方法是使用粘土的温度(摄氏度)或粘土的质量(十克)。为了使示例具体化,假设我们要表示数字 50。5因此,要通过粘土的质量表示 50,粘土的质量应为 50 十克。如果我们想将被表示项增加到 51,则需要将质量增加到 51 十克。因此,随着被表示项的变化,质量(执行表示的属性)也会随之变化。具体而言,被表示项的增加需要增加表示的质量;被表示项的减少需要减少表示的质量。因此,我们有一个数量(表示的属性之一)来表示被表示项的数量。6表示量的大小就是被表示数的大小;因此,这种表示方式的“类似”在于表示量与被表示数的大小之间的类似性。
As mentioned above, one way for a representation to represent a number—the analog way—is for the amount, or quantity, of one of its properties to covary with the representandum. For example, one can use a lump of clay to represent a number by using the temperature of the clay in degrees Celsius, or the mass of the clay in decagrams. To make the example concrete, let us suppose that we want to represent the number fifty.5 Thus, to represent fifty via the clay’s mass, the clay would have a mass of fifty decagrams. If we want to increase the representandum to fifty-one, we need to increase the mass to fifty-one decagrams. Thus, as the representandum changes, the mass—the property doing the representing—changes in turn. Specifically, an increase in the representandum requires an increase in the mass of the representation; a decrease in the representandum requires a decrease in the mass of the representation. Thus, we have a quantity—one of the properties of the representation—representing the quantity of the representandum.6 The magnitude of the representing quantity just is the magnitude of the number being represented; thus, what is “analog” about this type of representation is the analogy between the magnitude of the representation and the magnitude of the representandum.
这种特征的一个重要部分是,它可以适用于连续或离散的表示和被表示项。对于我们的粘土示例,假设我们只对表示整数感兴趣,并且假设我们只有一十克的粘土块可以添加到我们的表示中(也许我们表示的数字是一个人的年龄计数)。在这种情况下,我们仍然会有一个模拟表示,即使执行表示的属性和被表示项都以离散而不是连续的步骤变化。重要的不是表示或被表示项是否涉及连续性,而是执行表示的表示的属性是否与被表示项共变。在许多情况下,协方差将是线性的,但至少它必须是单调的:被表示项的量级的增加/减少以表示的相关属性的量级的增加/减少为特征。7
An important part of this characterization is that it can hold for representations and representanda that are either continuous or discrete. For our clay example, suppose we are only interested in representing whole numbers, and suppose we only have single-decagram blobs of clay to add to our representation (perhaps the number we are representing is a count of a person’s age in years). In such a case, we would still have an analog representation, even though the property doing the representing and the representandum both vary in discrete, rather than continuous, steps. What matters is not whether continuity is involved in either the representation or representandum, but that the property of the representation that does the representing covaries with the representandum. In many cases, the covariance will be linear, but at a minimum it must be monotonic: an increase/decrease in the magnitude of the representandum is characterized by some increase/decrease in the magnitude of the relevant property of the representation.7
另一方面,非模拟表示则不是这样工作的。利用我们的粘土块,我们可以创建数字 50 的非模拟表示,方法是将粘土分成两个小块,将它们滚开,将其中一个做成数字“5”,将另一个做成数字“0”。按正确的顺序排列,这将表示数字 50。或者,我们可以将粘土块做成单个数字“L”,即罗马数字 50。8现在,虽然表示的某些物理属性(或其模式)必须起到表示作用(否则物理表示怎么能表示任何东西?),但反映被表示项大小的不是起到表示作用的物理属性的大小。相反,表示的物理属性以某种方式抽象出来,因此重要的不是属性的大小,而是属性的任意值或模式随着被表示项的变化而变化。
Non-analog representation, on the other hand, does not work this way. Using our lump of clay, we could create a non-analog representation of the number fifty by separating the clay into two smaller lumps, rolling them out, and forming one into the numeral “5” and the other the numeral “0.” Put in the right order, this would represent the number fifty. Alternatively, we could form our lump of clay into the single numeral “L,” the Roman numeral for fifty.8 Now, while it must be true that some physical property (or pattern thereof) of the representation must do the representing (how else could a physical representation represent anything?), it is not the magnitude of the physical property doing the representing that reflects the magnitude of the representandum. Rather, the physical properties of the representation are abstracted in a way so that it is not the magnitude of the property that matters, but only variation among arbitrary values or patterns of the property as the representandum varies.
让我们再看一个例子,这次是在现代汽车仪表盘上常见的电子表示的背景下。图 5.2的左侧描绘了一个模拟表示,它将从代表 6 变为 7,右侧描绘了一个单一的数字、非模拟表示,它做同样的事情;我们可以想象它们中的每一个都代表汽车中剩余的燃油量(以加仑为单位)。请注意,模拟显示屏中的段可能表示显示屏由离散元素组成,或者它们可能只是用于帮助读取连续变化的(比如)液位(即,它可能是离散的或连续的)的标记。对于这里提出的模拟表示的特征,这并不重要。无论哪种方式,仪表(此处表示为灰色)的大小(在本例中为高度)决定了代表值。因此,高度的增加(或减少)对应于代表值的增加(或减少)。
Let us look at another example, this time in the context of electronic representations of the kind we often find on the dashboards of contemporary cars. The left side of figure 5.2 depicts an analog representation as it would change from representing six to seven, and the right side depicts a single numerical, non-analog representation doing the same thing; we can imagine each of these as representing, say, the remaining amount of fuel in a car (in gallons). Note that the segments in the analog display could indicate that the display is composed of discrete elements, or they may simply be marks meant to help read a continuously-varying level of (say) liquid (i.e., it may be discrete or continuous). On the characterization of analog representation advanced here, it does not matter. Either way, it is the magnitude (in this case, height) of the gauge (depicted here as gray) that determines the representandum. Thus, an increase (or decrease) in height corresponds to an increase (or decrease) in the representandum.
图 5.2
左:六和七的模拟表示。右:六和七的非模拟表示。
Figure 5.2
Left: Analog representation of six and seven. Right: Non-analog representation of six and seven.
对于图 5.2右侧所示的数值表示,材料的大小或数量与表示本身无关。重要的是,某些部分以指定的模式被激活。随着被表示项从六增加到七,表示中的大小或数量不会增加。相反,被激活的元素只是从一种模式变为另一种模式。具体来说,段 D、E、F 和 G 被停用,段 B 被激活,段 A 和 C 保持不变。同样,在模拟表示的情况下,随着被表示项的增加(或减少),表示的量级会单调变化;在数字情况下,表示的不是任何物理属性本身的大小,而只是能够改变的物理量的模式(事实上,在这个例子中,当我们从六增加到七时,激活段的数量会减少,但如果我们从七增加到八,激活段的数量就会增加)。
As for the numerical representation depicted on the right side of figure 5.2, the magnitude or quantity of the material is irrelevant to the representation qua representation. What matters, instead, is that certain segments are activated in specified patterns. As the representandum increases from six to seven, there is no magnitude or quantity that increases in the representation. Instead, the elements that are activated simply change from one pattern to another. In particular, segments D, E, F, and G are deactivated, segment B is activated, and segments A and C are unchanged. Once again, in the case of the analog representation, there is a monotonic change in the magnitude that does the representing as the representandum increases (or decreases); in the digital case, it is not the magnitude of any physical property per se that does any representing, but only a pattern of a physical quantities that are capable of changing (in fact, the number of activated segments decreases in this example as we go from six to seven but would then increase if we went from seven to eight).
这些例子的教训具有普遍性。根据这种观点,任何模拟表示的定义特征是,表示的物理属性的量级或数量会随着被表示项的增加(或减少)而系统地(即单调地)增加(或减少)。在非模拟表示的情况下,表示的物理属性只需随着被表示项的增加或减少而以某种方式改变即可。当被表示项系统地变化时,表示的表示相关属性的量级不需要发生系统性变化。
The lesson from these examples generalizes. The defining feature of any analog representation on this view is that the magnitude or amount of the physical property that does the representing increases (or decreases) systematically (i.e., monotonically) with increases (or decreases) in the representandum. In the case of a non-analog representation, the physical property that does the representing only needs to change in some way or another as the representandum increases or decreases. There need be no systematic change in the magnitude of the representationally relevant property of the representation as the representandum systematically varies.
如图5.1所示,有些非模拟表示值得与数字区分开来。数字的特殊之处在于它们可以组合形成复数表示,而非数字则不然。非数字表示的例子包括“ π ”和“ e ”,它们代表数学常数 pi 和欧拉数。这些符号可以用与上述数字显示相同类型的分段显示来表示。但是,将它们称为数字表示是错误的:这些类型的符号不像数字那样连接在一起形成复数表示(即数字表示,将在下文中详细讨论)。
As illustrated in figure 5.1, there are some non-analog representations that are worth distinguishing from numerals. Numerals are special in that they combine to form complex representations, whereas non-numerals do not. Examples of non-numeral representations include “π” and “e,” which represent the mathematical constants pi and Euler’s number. These symbols could be represented with the same kind of segmented display as the numerical display above. However, it would be a mistake to call these digital: these kinds of symbols are not concatenated in the same ways that numerals are to form complex representations (i.e., numerical representations, which will be discussed in more detail below).
最后一点值得注意。虽然从概念上讲,连续的非模拟表示是不存在的。可以肯定的是,在某种意义上,连续表示存在非模拟表示;实数的数字展开就是一个例子。9但是,没有任何表示本身在表示的属性上连续变化,但该属性不以模拟方式表示(因此图 5.1右下象限中的灰色框)。同样,许多表示具有连续变化的物理属性,但随后被用作非模拟表示。当代数字计算机中实现的二进制数字(用电压表示)就是一个例子。电压在不同值之间变化时连续变化(例如,从大约零伏的“低”值到大约五伏的“高”值),我们将该电压离散化(基本上将其四舍五入为零或五)并将其视为只有两个值:低值表示数字“0”,高值表示数字“1”。
One final point is worth noting. Although it may be conceptually possible, there are no continuous, non-analog representations. To be sure, there are, in a sense, non-analog representations of continuous representanda; the digital expansion of a real number is one example.9 However, there are no representations that themselves vary continuously in the property that does the representing, but where that property does not represent in an analog way (hence the gray box in the lower right quadrant of figure 5.1). Again, many representations have physical properties that vary continuously but are then used as non-analog representations. Binary numerals as they are implemented in contemporary digital computers, represented by voltage, are an example. The voltage varies continuously as it changes between different values (e.g., a “low” value of around zero volts to a “high” value of around five volts), we discretize that voltage (basically rounding it to zero or five) and treat it as though it only has two values: the low value represents the numeral “0,” and the high value represents the numeral “1.”
某些非模拟表示最有趣的地方可能在于,它们可以以各种方式组合起来形成复杂的表示。其中包括我们熟悉的数字表示,这对于我们的目的来说最为重要,但还有其他一些表示也值得一提。根据 Chrisomalis (2020) 的说法,我将这些表示称为数值表示。数值表示有几种类型,都很有趣,但我将重点讨论数字表示(Chrisomalis 将其归类为一种累积位置数值系统)。从历史上看,这种表示方案甚至不是不同文化中最常见的。然而,鉴于该方案是当代数字计算的基础,它对于理解现存的计算系统具有特殊的哲学意义。
Perhaps the most interesting thing about certain non-analog representations is that they can be combined in various ways to form complex representations. Among these are the familiar digital representations, most important for our aims here, but there are others worth mentioning. Following Chrisomalis (2020), I will call these numerical representations. There are several families of numerical representations, all quite interesting, but I will focus the discussion here on digital representations (which Chrisomalis categorizes as a type of cumulative positional numerical system). Historically, this kind of representational scheme is not even the most common among different cultures. However, given that this scheme is the basis of contemporary digital computation, it is of particular philosophical importance for understanding extant computational systems.
图 5.3
某些类型的数值表示。
Figure 5.3
Some kinds of numerical representations.
在数字表示方案中,单个数字以不同的方式组合,有时还与辅助符号组合,以形成新数字的表示。例如,罗马数字方案认为每个数字都有特定的值,该值与每个数字本身的值相同。因此,“I”代表一;“V”代表五,依此类推。数字的书写方式是最大值在最左边,向右减小。整个数字串所代表的数字是各个数字所代表的数字之和。因此,“XVI”由代表十、五和一的数字组成;因此整个字符串代表十六。惯例中出现了轻微的复杂情况,即一个数字后面跟着一个较大的数字(在右边),这意味着要从较大的数字中减去较小的数字。因此,“XIV”代表十四。
In numerical representational schemes, individual numerals are combined in different ways, sometimes with auxiliary symbols, to form representations of new numbers. The Roman numeral scheme, for example, takes each numeral to have a certain value, which is the same value that each numeral has on its own. Thus, “I” represents one; “V” represents five, and so on. Numerals are written so that the largest-valued ones are left-most, and decrease to the right. The number represented by the entire string of numerals is the sum of the numbers represented by the individual numerals. Thus, “XVI” consists of the numerals representing ten, five, and one; the whole string thus represents sixteen. A slight complication arises with the convention in which a numeral followed by a larger numeral (on the right) was taken to mean that the smaller numeral was to be subtracted from the larger. Thus, “XIV” would represent fourteen.
另一个例子是希腊数字方案。在这个方案中,字母表中的前九个希腊字母代表一到九的值;接下来的九个字母代表十到九十的值;最后九个字母代表一百到九百的值。同样,所表示的数字是每个数字所表示的数字的总和。使用相同的从左到右、从大到小的惯例,“ ΦΛ B ”代表五百 ( Φ ) 三十 ( Λ ) 二 ( B )。有趣的是,在罗马和希腊等方案中,不需要数字来表示零;因此“DII”和“ Φ B ”分别代表五百零二。
Another example is the Greek numeral scheme. In this scheme, the first nine Greek letters of the alphabet represent the values one through nine; the next nine letters represent the values ten through ninety; the last nine represent the values one hundred through nine hundred. Again, the number represented is the sum of the numbers represented by each individual numeral. Using the same left-to-right, largest-to-smallest convention, “ΦΛB” represents five hundred (Φ) thirty (Λ) two (B). Interestingly, in schemes like the Roman and Greek, there is no need for a numeral to represent zero; thus “DII” and “ΦB” each represent five hundred and two.
还有其他方案使用乘法而不是加法,以及普通分数的除法方案;Chrisomalis (2020) 详细讨论了其中的许多方案。所有这些方案的共同点在于,它们的成员都是由单个数字构成的复合体,其中整个表示的代表项是单个数字的代表项的函数。因此,让我们转向最熟悉的数值方案,即数字方案,这也是我们对物理计算的几乎所有理解的基础。
There are still other schemes that use multiplication instead of addition, as well as the divisional scheme of common fractions; Chrisomalis (2020) discusses a number of these in fascinating detail. What unites all of these schemes is that their members are complexes, built out of individual numerals, where the representandum of the entire representation is a function of the representanda of the individual numerals. So let us turn to the most familiar numerical scheme, the digital scheme, which is also the basis upon which virtually all of our understanding of physical computation is built.
数字方案最早由刘易斯 (1971) 在哲学文献中提出,将数字表示为一串数字。每个数字对被代表数的贡献取决于其在字符串中的位置以及方案的进制。我们最熟悉的是使用十进制,它需要 0 到 9 的数字。最右边的数字(此处的“数字”是指特定位置的数字)为被代表数贡献了那么多个单位,下一个(左边的)数字贡献了那么多个十进制,依此类推。因此,“272”的被代表数等于:
The digital scheme, first articulated in the philosophical literature by Lewis (1971), represents a number as a string of numerals. The value that each numeral contributes to the representandum depends on its position in the string, as well as the base of the scheme. Most familiarly, we use base-10, which requires the numerals 0 through 9. The right-most digit (“digit” here is the term for a numeral in a particular place) contributes that many units to the representandum, the next (to the left) digit contributes that many tens, and so on. Thus, “272” has the representandum equal to:
更一般地,对于除 10 之外的基数b,我们将上面的 10 替换为数字b,数字范围从 0 到 ( b -1)。有趣的是,数字方案可以通过使用小数位进一步扩展。小数位右边的数字可以表示基数的负幂,从而可以表示实数。
More generally,10 for a base b other than 10, we would replace the 10 above with the number b, and the numerals would range from 0 to (b− 1). Interestingly, the digital scheme can be further extended by the use of a decimal place. Digits to the right of the decimal place can then represent negative powers of the base, allowing for real numbers to be represented.
对于计算机来说,数字方案的一个有用事实是,只需两个数字即可表示任何可表示的数字。同样,由于我们关心的是物理表示和计算,这使得数字机器的实现(相对)简单:所需要的只是一个可以处于两种不同物理状态的物理基底。因此,二进制系统在当代数字计算机中几乎是通用的。
When it comes to computers, one useful fact about the digital scheme is that only two numerals are necessary to represent any representable number. Again, because we are concerned with physical representation and computation, this makes the implementation of a digital machine (relatively) simple: all one needs is a physical substrate that can be in two different physical states. The binary system is thus virtually universal in contemporary digital computers.
在结束本节之前,我们应该注意符号表示和数值表示之间的区别。符号表示根本不用于这里所理解的数值表示。我这样说的意思是,单个符号表示不会连接或以其他方式系统地与其他符号表示组合以形成新的表示。当然,这是一个偶然的事实:我们可以使用字符串“ π 0”来表示 ( π × 10 1 ) + (0 × 10 0 ) = 31.4145926 ...但不存在像这样的现存表示方案。当然,符号的连接确实发生在数学环境中,但它们不是数值方案。11
Before concluding this section, we should note the difference between symbolic representation and numerical representation. Symbolic representations are simply not used in numerical representations as understood here. By this I simply mean that individual symbolic representations are not concatenated or otherwise systematically combined with other symbolic representations to form new representations. This is, of course, a contingent fact: we could have used the string “π0” to represent (π × 101) + (0 × 100) = 31.4145926… But there are no extant representational schemes like this. Concatenations of symbols do occur in mathematical contexts, of course, but they are not numerical schemes.11
费尽心思创建这种表示类型的目的是为了展示模拟表示和数字表示之间的巨大差异。它们不仅仅是同一枚硬币的两面。相反,模拟表示和非模拟表示之间存在更根本的区别。数字是非模拟方面的一种特殊表示类型,而数字表示则是一种由数字(以及许多其他数字)构成的复杂表示类型。模拟表示的不同之处并不在于它们是连续的,而是在于它们以根本不同的方式表示数字。前面我提到,这种差异也可以描述为一阶表示与二阶表示;我们现在可以明白这意味着什么了。
The point of going to the trouble of creating this typology of representations is to show how very different analog and digital representation really are. They are not merely two sides of the same coin. Instead, there is a more fundamental division between representations that are analog and those that are not analog. Numerals are a particular type of representation on the non-analog side, and digital representation is then one particular complex representational type built of numerals (among many others). Analog representations are not different because they are continuous but because they represent numbers in a fundamentally different way. Earlier I mentioned that this difference can also be characterized as first-order versus second-order representation; we can now see what that means.
首先要使用一些物理属性作为表示(或表示的一部分)。如上所述,对于模拟表示,物理属性的量级表示被表示项的量级。在电子模拟计算机中,5 伏特表示数字 5。将其视为一阶表示。对于数字表示,物理属性的某个值表示一个符号(数字或其他符号),该符号表示被表示项的名称(或名称的一部分)。在电子数字计算机中,5 伏特12表示数字“1”。一般来说,数字“1”是某些数字表示的一部分,它表示个位数、两位数或四位数等等。相对于模拟表示,这是二阶表示。
Start with some physical property to be used as a representation (or part of a representation). As we saw above, with analog representation, the magnitude of the physical property represents the magnitude of the representandum. In an electronic analog computer, five volts represents the number five. Think of this as first-order representation. For digital representation, some value of the physical property represents a symbol (a numeral or some other symbol), and that symbol represents the name of the representandum (or part of the name). In an electronic digital computer, five volts12 represents the numeral “1.” In general, the numeral “1” is then a part of some digital representation, where it represents either the ones digit, or the twos digit, or the fours digit, and so on. Relative to analog representation, then, this is second-order representation.
尽管存在这些差异(现有观点无法仅使用“离散与连续”的资源来捕捉这些差异),但模拟和数字表示的相似之处在于,它们都是不同类型计算的基础。理解模拟和数字计算需要以本节所示的方式理解模拟和数字表示:这不仅仅是连续和离散表示之间的差异。如果我们接受计算需要表示的假设,那么我们只关注数字表示就对自己不利了,因为数字表示在表示领域中只占了相对较小的空间。模拟空间被忽略了,主要是因为人们误解了它的有趣特征。在下一节中,我们将开始纠正这种情况。
Despite these differences—differences that the received view cannot capture, using only the resources of “discrete versus continuous”—analog and digital representation are alike in that both are the basis of different types of computation. Understanding analog and digital computation requires understanding analog and digital representation in just the way shown in this section: it is not merely the difference between continuous and discrete representations. If we accept the assumption that computation requires representation, we have been doing ourselves a disservice by looking only at digital representations, which occupies a relatively small space of the representational landscape. The analog space has been ignored, largely because its interesting features have been misunderstood. In the next section, we will begin to remedy this situation.
就像模拟和数字表示的情况一样,认为模拟和数字计算是根据它们使用的表示(或机制、变量或其他任何东西)是连续的还是离散的来区分的,这是错误的。相反,计算机器是模拟的还是数字的,只是它使用的是模拟还是数字表示,具体含义见第 5.2 节。这需要一些解析,这也是本节的目的。我将简要介绍一些数字和模拟计算的例子,并提到一些使它们独一无二的特点。
Just as it was for the case of analog and digital representation, it is a mistake to think that analog and digital computation are separated by whether the representations (or mechanisms, variables, or anything else) they use are continuous or discrete. Instead, what makes a computational machine analog or digital is just whether it uses analog or digital representations, in the specific sense outlined in section 5.2. This requires some unpacking, which is the purpose of this section. I will briefly examine some instances of digital and analog computation and mention some of the features that make each unique.
几乎所有当代计算机都是数字计算机,基于冯·诺依曼(1982 年)最初提出的设计。与流行的说法相反,这些计算机之所以特别数字化,是因为它们以多种方式利用了上述意义上的数字。关于计算背后的一般思想已经在其他地方讨论过了,所以这里让我们关注数字计算的具体数字化部分。所有数据最终都以数字形式存储,包括用于控制机器精确操作的程序。例如,数字计算机中的加法和乘法是通过我们从小学就熟悉的算法进行的:这些运算是逐位操作的结果。例如,数字的加法首先从添加最低有效位开始,然后添加次高有效位(如果需要,加上进位位),然后添加次高有效位,依此类推。这种算法只有使用我们都熟悉的数字系统才有可能——唯一的区别是,数字计算机使用的是二进制系统,而不是十进制。
Virtually all contemporary computing machines are digital computers, based on a design originally proposed by von Neumann (1982). Contrary to popular accounts, what makes these computers specifically digital is the many ways that they take advantage of numbers represented digitally, in the sense presented above. Much has been written about the general ideas behind computation elsewhere, so here let us focus on what is specifically digital about digital computation. All data is ultimately stored as numbers represented digitally, including the very programs that are used to control the precise actions of the machine. So, for example, addition and multiplication in digital computers happens via algorithms familiar to us from elementary school: these operations are the result of digit-by-digit manipulations. The addition of numbers, for example, starts with adding the least-significant digits first, then adding the next-most-significant digits (plus a carry digit, if necessary), then the next-most, and so on. This kind of algorithm is only possible using the digital system with which we are all familiar—the only difference is, in a digital computer, a base-2 system is used, rather than base-10.
同样,数字表示可以系统地组织数据。举个简单的例子,假设我们有 32 个可能的位置来存储数据。在二进制中,我们可以将这 32 个位置写为地址,编号为 00000、00001、00010、…… 11111。出于各种原因,将这些位置分散到几个不同的组中效率更高。因此,假设我们有四个不同的“芯片”,每个芯片有八个单独的位置。
Similarly, digital representations enable a systematic organization of the data. As a toy example, suppose we have thirty-two possible locations for storing data. In base-2, we can write those thirty-two as addresses numbered 00000, 00001, 00010, …11111. For a variety of reasons, it is more efficient to spread these locations among several different groups. So, suppose we have four different “chips,” each of which has eight individual locations.
图 5.4
数字寻址。
Figure 5.4
Digital addressing.
当我们需要访问特定地址时,我们可以使用两个最高有效数字来定位特定芯片,然后使用三个最低有效数字来定位该芯片内的特定地址。数字计算机中的电路设计为数据地址的解析方式(尽管在实践中,解析的规模要大得多)。同样,这是一种利用数字表示中数字的特定组织方式的方法。
When we need to access a particular address, we can use the two most-significant digits to locate a particular chip, then the three least-significant digits to locate a particular address within that chip. The circuitry in a digital computer is designed so that this is how the addresses of data are parsed (although, in practice, at a much larger scale). Again, this is a way of taking advantage of the particular organization of numerals in a digital representation of numbers.
实际的数字计算机包含大量数据,这些数据通过大量操作以极高的速度进行处理。它们通过允许程序员以多种复杂的方式将少量操作串联起来来实现其功能,通常对于可能数据的不同组合使用不同的路径。然而,从根本上讲,它们的操作依赖于其原理与此处的玩具示例并无不同之处的组件;实际机器的唯一区别在于它们的规模。
Actual digital computers contain huge amounts of data that are processed using a large number of operations at very high speeds. They accomplish what they do by allowing programmers to string together a small number of operations in many complicated ways, usually with different paths for different combinations of possible data. At bottom, however, their operation depends on components whose principles are not different in kind to the toy example here; the only difference in actual machines is their scale.
数字计算有优点也有缺点。一个非常简单的例子是某些操作速度很快。在十进制中,增加数字的数量级就像在其数字表示中添加零一样简单。对于二进制,添加零将使数字乘以二。此外,我们可以通过简单地添加更多数字来提高任意表示数字的精度。但是,有些操作很慢。确定列表中的最大元素需要检查该列表中的每个元素。数字计算机必须以与人们必须在一叠索引卡片中找到最大的三位数的方式执行此操作,其中每张卡片上都写有一个数字。为了找到最大的数字,我们必须检查第一个数字并将其与第二个数字进行比较,保留两者中较大的一个。然后,我们必须将该数字与第三个数字进行比较,保留两者中最大的一个。对堆栈中的每一张卡片重复此过程后,我们将找到最大的元素。
There are advantages and disadvantages to digital computation. One very simple example is that certain operations are fast. In base-10, increasing the order of magnitude of a number is as simple as adding a zero to its digital representation. For base-2, adding a zero multiplies the number by two. Additionally, we can increase the precision with which we represent a number arbitrarily by simply adding more digits. However, some operations are slow. Determining the largest element in a list requires examining every element in that list. A digital computer must do this in much the same way a person would have to find the largest three-digit number in a stack of index cards, where each card has a single number written on it. In order to find the largest, we would have to examine the first one and compare it to the second, keeping the larger of the two. Then, we would have to compare that one to the third one, keeping the largest of those two. After repeating this process for every card in the stack, we will have found the largest element.
人们可能会想,我们用数字计算机做的所有事情似乎都与数字的操作或存储无关。我正在用文本编辑器写这句话,后台还有其他几个程序在运行,检查电子邮件、显示即将到来的约会的通知以及播放音乐。这些都与处理数字无关。
One might wonder about all of the things we do with digital computers that seem to have nothing to do with the manipulation or storage of numbers. I am writing this sentence with a text editor, and in the background, several other programs are running, checking email, displaying notifications about upcoming appointments, and playing music. Very little of this is about processing numbers.
答案当然是,这些东西最终都是用二进制数字在机器中表示的。不久前,CD 是一种广泛使用的音乐录音格式。但 CD 不仅仅可以保存音乐:它们可以保存任何类型的数据。这是因为,无论数据类型如何,它最终都表示为数字,以二进制表示,其中各个数字在物理上表示为 CD 表面上的凹坑(1)和凹岸(0)。然后,这些数字可以依次表示声音元素、字母、像素或任何其他特定程序已编写以将其解释为的东西。但存储、读取和操作所有这些东西的机制只有在数据以数字表示的背景下才能理解。
The answer, of course, is that these things are all ultimately represented in the machine by binary numbers. Not long ago, CDs were a widespread format for musical recordings. But CDs need not hold only music: they can hold data of any type. What makes this true is that, whatever the type of data, it is ultimately represented as numbers, which are represented in binary, and where the individual numerals are physically represented as pits (1s) and lands (0s) on the surface of the CD. Those numbers can then, in turn, represent elements of sounds, or letters, or pixels, or anything else that a particular program has been written to interpret them as. But the mechanisms that store, read, and manipulate all of those things can only be understood in the context of that data being represented as numbers represented digitally.
数字计算使用数字表示,而模拟计算使用模拟表示。不幸的是,我们今天对模拟计算不太熟悉,但简单的例子可以很好地说明这些原理。
Whereas digital computation uses digital representations, analog computation uses analog representations. Unfortunately, we are today much less familiar with analog computation, but simple examples can illustrate the principles quite well.
首先,要了解模拟加法与数字加法有何不同,请再次考虑我们如何将一对两位数相加,就像我们在小学学到的那样。我们将它们排成一行,将个位数字相加并记下结果,然后添加十位数字,必要时包括进位数字。同样的加法运算可以以模拟方式完成使用两把长尺。假设我们要将三十四加到四十二。我们在第一把尺子上找到四十二,在第二把尺子上找到三十四。然后,我们滑动第二把尺子,使其零点与四十二对齐。最后,我们看看第一把尺子上的数字与第二把尺子上的三十四对应。从文字上看,这听起来很复杂,但我们要做的是将两个表示为量级的数字相加。因为表示的量级是长度,所以我们只需将这些长度相加(即连接起来)。
First, to understand how very different analog addition is from digital addition, consider again how we add a pair of, say, two-digit numbers as learned in elementary school. We line them up, add the ones digits and write down the result, then add the tens digits, including a carry digit if necessary. The same addition operation could be done in an analog fashion using two long rulers. Suppose we want to add thirty-four to forty-two. We locate forty-two on the first ruler, and thirty-four on the second. Then, we slide the second ruler so that its zero point lines up with forty-two. Finally, we see what number on the first ruler corresponds to thirty-four on the second ruler. Verbally this sounds complicated, but what we are doing is adding two numbers represented as magnitudes. Because the magnitudes that are doing the representing are lengths, we simply add (i.e., concatenate) those lengths.
图 5.5
简单模拟加法器。
Figure 5.5
Simple analog adder.
20 世纪早期的机械模拟计算机使用的原理与此类似:数字用长度或其他量值(如角度、位移或不同物理部件的旋转次数)来表示。一个更复杂的例子是磁盘积分器,如图 5.6所示(改编自 Maley,即将出版)。
Mechanical analog computers of the early twentieth century used principles just like this: numbers were represented by lengths or other magnitudes such as angles, displacements, or the number of rotations of different physical components. A more complicated example is the disk integrator, shown in figure 5.6 (adapted from Maley, forthcoming).
图 5.6
机械积分器。B 的左右位移为输入,输出为连接轴的运行总转数。
Figure 5.6
A mechanical integrator. The left-right displacement of B is the input, and the output is the running total number of rotations of the connected shaft.
假设我们要对某个函数进行积分:我们输入函数的值,然后输出该函数在该时间点之前的定积分。该积分器通过一个不断旋转的转盘(圆盘 A)工作,该转盘驱动一个垂直圆盘(B),该圆盘可以靠近或远离 A 的中心。当 B 靠近 A 的边缘时,它会相对快速地旋转;当它位于 A 的中心时,它根本不会旋转。
Suppose we want to integrate some function: we input the value of the function, and get as output the definite integral of that function up to that point in time. This integrator works via a constantly-rotating turntable (disk A), which drives a perpendicular disk (B), which can move closer or farther from the center of A. When B is near the edge of A, it will rotate relatively quickly; when it is at the center of A, it will not rotate at all.
该积分器的输入(要积分的函数值)是圆盘 B 相对于圆盘 A 中心的位移。输出是 B 旋转的总次数。如果要积分的函数从零开始然后增加,B 将从 A 的中心开始向外移动(同时 A 以恒定速度旋转)。如果函数减小,B 会向 A 的中心移动(如果函数变为负值,B 会移动到 A 的另一侧)。B 速度的变化使得 B 旋转的累计次数是函数的定积分。
The input to this integrator—the value of the function to be integrated—is the displacement of disk B relative to the center of disk A. The output is the total number of rotations of B. If the function to be integrated started at zero and then increased, B would begin in the center of A and move outward (all while A is rotating at a constant speed). If the function decreased, B would move back toward the center of A (and would move to the other side of A if the function became negative). This variation in B’s speed is such that the running total of B’s rotations is the definite integral of the function.
与数字计算机非常相似,这些组件可以以多种方式组合,以比人类所能执行的任何操作都要快得多的速度进行更复杂的计算。图 5.7显示了这种机械模拟计算机的一个例子,其中许多这样的组件连接在一起,以及可以以图形方式绘制输出或输入的绘图表。这些机械机器最终让位于电子模拟计算机,电子模拟计算机使用电压或电阻等电量来表示数字;Maley(即将出版)包含几个例子。
Much like digital computers, these components could be combined in a large number of ways to produce much more complicated computations at speeds greater than anything a human could perform. One example of such a mechanical analog computer is shown in figure 5.7, where many such components are connected together, as well as drawing tables upon which output or input could be plotted graphically. These mechanical machines eventually gave way to electronic analog computers that used electrical magnitudes, such as voltage or resistance, to represent numbers; Maley (forthcoming) contains several examples.
图 5.7
机械模拟计算机。版权归剑桥大学计算机科学与技术系所有。经许可转载。
Figure 5.7
Mechanical analog computer. Copyright Department of Computer Science and Technology, University of Cambridge. Reproduced by permission.
就像数字计算机一样,模拟计算机也有优点和缺点。有些操作很快。例如,假设(再次)我们要对一组数字进行排序。这一次,它们不是写在索引卡上(即以数字表示),而是用干意大利面条的长度来表示(示例取自 Dewdney,1984 年)。为了找到最大的元素,只需将所有面条捆在一起,将一端压在硬表面上,然后检查哪根面条伸得最高。这个过程本质上是一个步骤,而数字搜索任务需要与要搜索的元素一样多的步骤。其他操作较慢。虽然在数字表示中增加数量级相当于在表示中添加一位数字,但在模拟情况下,这种增加可能要复杂得多。在 1 厘米的长度上增加 9 厘米需要一些工作,再增加 90 厘米则需要更多的工作;在数字情况下,我们只需添加一位数字,然后再添加一位数字。
Just like their digital counterparts, analog computers have advantages and disadvantages. Some operations are fast. For example, suppose (once again) that we want to sort a set of numbers. This time, instead of being written down on index cards (i.e., represented digitally), they are represented by lengths of dry spaghetti noodles (an example taken from Dewdney, 1984). In order to find the largest element, one simply needs to hold the bundle of all noodles together, tamp one end down on a hard surface, and then examine which noodle sticks up the highest. This process is essentially a single step, whereas the digital searching task took as many steps as there were elements to search. Other operations are slower. Whereas increasing the order of magnitude in a digital representation amounted to adding a digit to the representation, such an increase in the analog case can be much more complicated. Adding nine centimeters to a length of one centimeter takes some work, and adding ninety more takes even more work; in the digital case, we simply add a digit, then add another one.
模拟计算的最大缺点可能是精度问题。因为模拟表示只是用物理量来表示量,所以我们对表示的了解仅限于我们可以观察到的物理量。例如,如果没有专门的设备,很难测量千分之一厘米的长度;因此,使用长度的表示可能仅限于少数有效数字。然而,在数字情况下,我们只需在表示中添加更多数字,就可以无限提高精度。
Perhaps the most significant disadvantage of analog computation is the issue of precision. Because analog representation just is the representation of magnitudes by physical magnitudes, what we can know about a representation is limited to what we can observe about the physical magnitudes in question. It is difficult to measure lengths in thousandths of a centimeter without specialized equipment, for example; thus, a representation using length might be limited to a small number of significant digits. In the digital case, however, we can increase precision indefinitely by just adding more digits to the representations.
除了博物馆和私人收藏外,模拟计算机已不复存在;这才是应该的。作为实用的计算机器,当代数字计算机几乎在各个方面都胜过模拟计算机。然而,模拟计算确实在我们对一般计算的理解中占有一席之地,而不仅仅是在数字的特殊情况下。此外,在许多情况下,神经系统可能使用模拟表示和计算。一个简单的例子是通过神经放电率来表示刺激:近一个世纪以来,人们已经知道,随着音调变大或肌肉拉伸越来越大,一些神经元会更频繁地放电(Adrian,1926)。因此,随着相关刺激的增加,放电率单调增加,这是一种模拟表示。将其与处理该表示的正确机制相结合,我们就有了一个明显的模拟计算案例(Maley,2018)。
Analog computers no longer exist in any quantity outside museums and private collections; this is how it should be. As practical computing machines, contemporary digital computers outperform analog computers in almost every way. However, analog computation does have a place in our understanding of what computation is in general, not just in the special case of the digital. Moreover, it may well be that, in many cases, neural systems use analog representation and computation. One simple example is the representation of stimuli via neural firing rates: it has been known for almost a century that some neurons will fire more frequently as a tone gets louder or a muscle is increasingly stretched (Adrian, 1926). The firing rate thus monotonically increases as the relevant stimulus increases, which is an analog representation. Combine this with the right mechanisms in place to process that representation, and we have a clear case of analog computation (Maley, 2018).
正确理解模拟计算和表征,我们就能对心智(无论是自然的还是人工的)的计算性有了新的认识。这种方法可能会使心智/大脑首先是计算性的论点更容易被认知科学中的某些反计算/反表征研究项目接受(Isaac,2018 年提出了类似的观点)。特别是,激进具身认知(Chemero,2009 年;Van Gelder,1995 年)和激进行为认知(Hutto,2006 年)的某些支持者认为,认知不是计算,表征在理解认知中没有地位。然而,他们所考虑的都是数字(或至少是非模拟)计算,其中表征独立于其物理基础,系统可以用纯动态术语来理解,而不需要任何表征语言。
Properly understood, analog computation and representation give us a new understanding of what it could mean for minds—whether natural or artificial—to be computational. This approach might make the thesis that the mind/brain is computational in the first place more palatable to certain anti-computation/anti-representation research programs in cognitive science (a similar point has been made by Isaac, 2018). In particular, certain proponents of radical embodied cognition (Chemero, 2009; Van Gelder, 1995) and radical enactive cognition (Hutto, 2006) have argued that cognition is not computation, and that representation has no place in understanding cognition. However, what they all have in mind is digital—or at least non-analog—computation, where representations are independent from their physical bases, and systems can be understood in purely dynamical terms, without the necessity of any representational language.
同样,要说的比篇幅要多,但也许简短而错误(但希望是有趣的错误)比什么都不说要好。考虑一下图 5.8中描绘的瓦特调速器,这是范·盖尔德(1995 年)最初提出的例子,据称它是一个具有复杂行为的系统,可以在不设定表示的情况下完美描述。瓦特调速器是一种用于调节(或控制)蒸汽机速度的装置。蒸汽流入发动机,然后驱动飞轮,飞轮为其他机械提供动力。进入发动机的蒸汽越多,飞轮转动得越快;调速器通过调节流入发动机的蒸汽量,将速度保持在一个狭窄的范围内。飞轮连接到图 5.8左下方的水平滑轮,使金属球旋转。这些球通过铰链连接到臂,因此,当它们旋转时,它们会向外移动(通过所谓的离心力)。这些铰链连接到较小的臂,较小的臂连接到一个大垫圈,这样,当球旋转得更快时,它们会使垫圈向下移动,从而拉下连接到蒸汽管内阀门的杠杆,从而减少流过管道的蒸汽量。简而言之,它使用发动机速度来驱动一个机制,当速度过高时,该机制会降低速度(通过关闭蒸汽阀),但当速度过低时会增加蒸汽流量。它本质上是一种反馈机制,使蒸汽流量保持在一定范围内,从而使发动机保持在一定的转速。
Again, there is more to say than there is space to say it, but perhaps it is better to be brief and wrong (but interestingly wrong, one hopes) than to say nothing at all. Consider the Watt governor depicted in figure 5.8, an example originally introduced by Van Gelder (1995), purportedly as a system with complex behavior that can be perfectly described without positing representations. The Watt governor is a device meant to regulate (or govern) the speed of a steam engine. Steam flows into an engine, which drives a flywheel, which powers some other machinery. The more steam that goes into the engine, the faster the flywheel turns; the governor keeps this speed within a narrow range by adjusting the amount of steam flowing into the engine. The flywheel connects to the horizontal pulley on the bottom left of figure 5.8, causing the metal spheres to rotate. Those spheres are connected by hinges to arms so that, as they rotate, they move outward (by so-called centrifugal force). Those hinges are connected to smaller arms, connected to a large washer such that, as the spheres rotate faster, they cause the washer to move down, which in turn pulls down a lever connected to a valve within the steam pipe, which then reduces the amount of steam flowing through the pipe. In short, it uses the engine speed to drive a mechanism that then reduces that speed (by closing the steam valve) when the speed gets too high, but increases the steam flow when the speed gets too low. It is essentially a feedback mechanism that keeps the steam flow within a certain range, thus keeping the engine at a certain speed.
图 5.8
瓦特或离心调速器。
Figure 5.8
The Watt, or centrifugal, governor.
众所周知,范·盖尔德认为,臂角(理解这种装置的关键变量)可以完全用微分方程来表征。如果阀门是用一台小型数字计算机控制的,并连接传感器和伺服装置,那么我们当然必须讨论表示和算法才能理解这种装置。但事实并非如此,一个方程(具有正确的参数)完全涵盖了我们需要了解的有关该系统的所有信息。尽管人们可能倾向于认为臂角代表发动机转速,但范·盖尔德声称这是一个错误:方程就是我们所需要的一切,讨论表示和计算是多余的,甚至可能误导人。因此,该论点认为,我们的思维/大脑很可能是一个像瓦特调速器这样的系统,人们不需要讨论任何表示或计算,因此认知科学核心的计算假设可能是错误的。
Famously, van Gelder argued that the arm angle (the crucial variable in understanding this device) can be completely characterized by a differential equation. Had the valve been controlled with a small digital computer, hooked up to sensors and servos, we would certainly have to talk of representations and algorithms in order to understand the device. But given that it is not, a single equation (with the right parameters) completely captures all we need to know about this system. Although one might be tempted to think of the arm angle as representing the engine speed, van Gelder claims this would be a mistake: the equation is all one needs, and talk of representation and computation is superfluous and even potentially misleading. Thus—the argument goes—our mind/brain may well be a system like the Watt governor, where one does not need any talk of representation or computation, and so the computational assumption at the heart of cognitive science might be wrong.
有趣的是,瓦特调速器几乎是上述模拟计算机的完美例子(van Gelder 没有考虑到这种可能性),实际上与前面提到的二战期间和二战前后军舰上使用的机械模拟计算机非常相似(Mindell,2002)。这些机器将船的速度和航向作为输入,以轴的旋转和机械臂的角度表示,加上目标船的速度和航向信息,也以机械量表示。输出——击中目标所需的火炮位置——是输入的数学函数,由各种机械设备计算,输出为与相关火炮直接相关的物理位移。完全不清楚为什么这些机械模拟设备被视为代表输入和计算功能(根据创造它们的人的说法),而瓦特调速器却不是。
Interestingly, the Watt governor is almost a perfect example of an analog computer on the account I have outlined above (a possibility not considered by van Gelder), and actually quite similar to the mechanical analog computers used in military ships in and around World War II, mentioned earlier (Mindell, 2002). These machines took as input the ship’s speed and heading, represented as the rotation of shafts and angles of mechanical arms, plus information about the target ship’s speed and heading, also represented as mechanical quantities. The output—the gun position required to hit the target—was a mathematical function of the inputs, computed by various mechanical devices, and output as a physical displacement that connected directly to the relevant gun. It is not at all clear why these mechanical analog devices count as representing inputs and computing functions (by the lights of the very people who created them), whereas the Watt governor does not.
当然,瓦特调速器的臂角是否代表发动机转速是有争议的,而简单地断言它确实如此就会引出对范盖尔德的质疑。需要一个积极的论据,并且该论据不会最终将所有机械设备都描述为计算机。不幸的是,这样的论据会让我们走得太远。然而,可以证明的是,如果瓦特调速器在稍微不同的环境中使用,那么它将是一台模拟计算机,我们会将表示归因于它,以便分析和解释它的作用,即使按照范盖尔德的观点。
Of course, whether the arm angle of the Watt governor represents the engine speed is contentious, and simply asserting that it does would beg the question against van Gelder. A positive argument is required, and one that does not end up characterizing all mechanical devices as computational. Unfortunately, such an argument would take us too far afield. However, what can be shown is that, if the Watt governor were used in a slightly different context, then it would be an analog computer, and we would attribute to it representations in order to analyze and explain what it does, even by van Gelder’s lights.
如果我们使用发动机的转速作为输入,使用蒸汽阀的角度作为输出,那么瓦特调速器就是一台模拟计算机,其中计算的函数只是两者之间的数学关系。在这种情况下,它也与图 5.6中所示的机械积分器非常相似。现在,按照范盖尔德自己的看法,描述输入和输出之间关系的函数是一个数学函数,因此该设备将精确计算该函数。剩下的就是让输入和输出成为真实的表示。正如我们所见,各种物理量都可以作为模拟计算机的输入和输出:除了我们已经看到的电气和机械示例之外,经济模拟的 MONIAC 模拟计算机使用液位和液流作为变量(Isaac,2018)。因此,发动机转速和气门角度与任何东西一样好。
The Watt governor would be an analog computer if we were to use the speed of the engine as the input and the angle of the steam valve as the output, where the function computed is just the mathematical relation between the two. In that context, it would also be quite similar to the mechanical integrator illustrated in figure 5.6. Now, by van Gelder’s own lights, the function that describes the relationship between the input and output is a mathematical one, and thus this device would compute precisely that function. All that remains is for the inputs and outputs to be genuine representations. As we have seen, all kinds of physical quantities can serve as the inputs and outputs to analog computers: besides the electrical and mechanical examples we have already seen, the economy-simulating MONIAC analog computer used fluid levels and fluid flow as variables (Isaac, 2018). Thus, engine speed and valve angle are as good as anything.
现在,有人可能会反对我们可以随意取一个机械部件并将其用作模拟计算机的一部分的想法。然而,这并不奇怪。尽管模拟计算机并不像非模拟计算机那样独立于介质,但仍有各种各样的方法可以物理地创建模拟计算机。主要区别在于,由于给定机制所能表示的内容的物理限制,它只能实现一种特定的数学函数。例如,瓦特调节器的输出是其输入的特定三角函数,范围很窄。如果我们需要这样的功能,瓦特调节器将在我们的机械模拟计算机中完美运行,我们将通过展示臂角(和其他组件)如何表示方程中指定的量来解释将输入转换为输出的机制。但我们不能使用瓦特调节器来实现我们想要的任何功能。
Now, one might object to the idea that we can just take an arbitrary mechanical component and use it as part of an analog computer. However, this should not be all that surprising. Even though analog computers are not medium-independent in the way non-analog computers are, there is still a wide variety of ways to physically create an analog computer. The primary difference is that a given mechanism might only be able to implement one particular mathematical function because of physical constraints on what it can represent. The Watt governor, for example, has an output that is a specific trigonomic function of its input with a narrow range. If we need such a function, the Watt governor will work perfectly in our mechanical analog computer, and we will explain the mechanism that transforms the input to the output by showing how the arm angles (and other components) represent the quantities specified in the equation. But we cannot use the Watt governor to implement just any function we want.
激进的具身认知科学家和行动认知科学家是否会接受这些想法最终取决于他们自己;人们可能会认为,他们将会把这里提出的“模拟表征”概念看作一个相当单薄的表征概念,这反过来又使得模拟计算本身变得相当单薄。然而,如果历史先例真的重要的话,那么在数字计算机之前出现的模拟计算机,与任何其他计算形式一样,都是合法的计算形式。那些对计算怀有敌意但对动态系统友好的人(例如)可能不得不重新评估他们认为的敌人是谁。
Whether radical embodied and enactive cognitive scientists will be amenable to these ideas will ultimately be up to them; one might anticipate that they will take the notion of “analog representation” developed here to be a rather thin notion of representation, which in turn makes analog computation itself rather thin. However, if historical precedent matters at all, analog computers, which preceded digital computers, are just as legitimate a form of computation as anything. Those hostile to computation but friendly to dynamical systems (for example) may have to reevaluate who they think their enemies are.
当然,认知科学将思维/大脑视为计算机是否正确还有待观察。计算哲学家在帮助阐明计算可能是什么方面已经取得了很大进展,使这一论点上升到可检验假设的水平,而不是一个工作假设或一个纯粹的隐喻框架。正确理解模拟计算是这一努力的一部分,它为研究人员提供了更多选择,以原则性、系统性的方式理解思维/大脑如何成为计算机。
Of course, it remains to be seen whether cognitive science is correct in taking the mind/ brain to be a computer in the first place. Philosophers of computation have gone a long way in helping to clarify what computation might be, allowing for this thesis to rise to the level of a testable hypothesis, rather than a working assumption, or a mere metaphorical framework. Properly understanding analog computation is a part of this effort, allowing researchers more options for understanding how the mind/brain might be a computer in a principled, systematic way.
1.这并不是说没有写任何东西:一些很好的例子包括 Mahoney 的作品,以及(Campbell-Kelly 等人,2013 年)和(Haigh 和 Ceruzzi,2021 年)。
1. Which is not to say that nothing has been written: a few excellent examples include Mahoney’s work, as well as (Campbell-Kelly et al., 2013) and (Haigh and Ceruzzi, 2021).
2.众所周知,“计算机”一词指的是进行计算的人;在本节中,我将使用“人类计算机”或“计算机器”来避免歧义。
2. It is well known that the term “computer” referred to a human performing computations; in this section, I will refer to either “human computers” or “computing machines” to avoid ambiguity.
3 . 或者如果不是电压,无论实施媒介是什么。
3. Or if not voltages, whatever the implementational media happens to be.
4.所谓“连续”表示,我指的是可以连续变化的表示:沿着连续体。单个表示本身不能是连续的;但用其他方式表述会比较麻烦,所以我将使用这种惯例。
4. By a “continuous” representation, I mean a representation that can vary continuously: along a continuum. An individual representation cannot, by itself, be continuous; but phrasing things otherwise is rather cumbersome, so I will use this convention.
5.在这里,我将使用稍微麻烦一些的英语数字拼写,只是为了避免在文中使用一种表示方案来强调一种完全不同的方案。
5. I will use the somewhat cumbersome English spelling of numbers here, simply to avoid using one representational scheme in the text to make a point about a completely different scheme.
6.请注意,我并没有具体说明表示的相关属性实际上会如何变化;这与此无关。目前,我只是阐明了要使某事物成为模拟表示,需要什么条件。
6. Note that I am not specifying how the relevant properties of the representation would actually change; that is not relevant here. For the moment, I am just elucidating what is required for something to be an analog representation at all.
7.这在(Maley,即将出版)中得到了更精确的描述。
7. This is characterized more precisely in (Maley, forthcoming).
8.第一个例子也是数字化表示的情况,尽管第二个例子不是:下一节将详细介绍这种区别。
8. The first example is also a case of digital representation, although the second is not: more on this distinction is given in the next section.
9.我使用限定词“在某种意义上”,因为尽管实数是连续的,但单个实数却不是连续的,而且实数的所有数字展开式的集合本身也不是连续的。
9. I use the qualifier “in a sense” because even though the real numbers are continuous, a single real number is not, and it is not the case that the set of all digital expansions of real numbers is itself continuous.
10。对于高于 10 的基数,将使用新符号作为数字。十六进制(基数为 16)系统使用 A 表示 10,B 表示 11,依此类推,因此“A2F” = ( A ×16 2 ) + (2 ×16 1 ) + ( F ×16 0 ) = (10 ×256) + (2 ×16) + (15 ×1) = 2607。
10. In the case of bases higher than 10, new symbols are used as numerals. The hexadecimal (base-16) system uses A for 10, B for 11, and so on, so that “A2F” = (A×162) + (2 ×161) + (F ×160) = (10 ×256) + (2 ×16) + (15 ×1) = 2607.
11.例如,48 π被解释为一个两位数(四十八)与一个符号表示的一(π)的乘积,但不能解释为一个三位数。
11. For example, 48π is interpreted as the multiplication of a two-digit number (forty-eight) with a symbolically represented one (π), but not as a three-digit number.
12.通常,“1”用5伏表示,“0”用0伏表示,但也有其他惯例。
12. Usually, “1” is represented by five volts, and “0” is represented by zero volts, but there are other conventions.
人工智能 (AI) 的核心目标是制造智能机器。但机器“智能”是什么意思?我们如何判断机器是否真正智能?本部分的文章对这些基本问题持不同立场。
The central aim of artificial intelligence (AI) is to build intelligent machines. But what does it mean for a machine to be”intelligent”? And how can we tell when a machine is, in fact, intelligent? The articles in this part take different stances on these fundamental questions.
根据一种主流传统,追溯到图灵的《计算机器与智能》(第 6 章),一个答案足以回答这两个问题。在那篇文章中,图灵介绍了电影中用于智能的“图灵测试”,该测试使用人类对话范式作为评估机器的试金石。该测试将“智能机器”操作化为一种可以让观察者相信它(很可能)是一个(典型)人类的机器。正如法官波特·斯图尔特对色情的定义一样,我们看到智能时就知道它是什么。此外,我们通过将其与典型进行比较来了解它:在人工智能的情况下,与人类智能行为进行比较。
According to one dominant tradition, tracing to Turing’s “Computing Machinery and Intelligence” (chapter 6), one answer suffices for both questions. In that article, Turing introduces the movie-worthy “Turing Test” for intelligence, which uses the paradigm of human conversation as the touchstone for assessing machines. That test operationalizes “intelligent machine” as a machine that can convince observers more often than not that it is (likely) a (typical) human. As with Justice Potter Stewart’s definition of pornography, we know intelligence when we see it. Furthermore, we come to know it by comparing it to exemplars: in the case of AI, to human intelligent behavior.
Leveseque 的论点(第 7 章)更为狭隘和具有约束力:即使承认符号操作足以理解,图灵测试也太容易被操纵了。第一批聊天机器人之一 Eliza 扮演了罗杰斯心理治疗师的角色,只需将问题转向对话者:“你确定吗?”“那你感觉如何?”“你感觉那样多久了?”“我不确定我是否完全理解你。”尽管 Eliza 很难说服坚定的对话者,但它展示了一种未来聊天机器人更有效地模仿的策略:使用幽默和其他转移话题的策略来创造一个迷人的谈话者,善于分散对话者的注意力,使他们不去评估对话者是否真的知道发生了什么。Levesque 声称,所需要的是对计算机是否理解以及理解什么进行更直接的测试。为此,他认为,“Winograd 图式”可用于测试对话者在常见背景知识能够清楚、果断地解决难题的情况下,运用背景知识消除重复代词指称歧义的能力。
Leveseque’s argument (in chapter 7) is narrower and more proscriptive: even granting that symbol manipulation might suffice for understanding, the Turing test is too easily gamed. One of the first chatbots, Eliza, plays the role of a Rogerian psychotherapist by simply turning questions around on the interlocutor: “Are you sure?” “And how does [that] make you feel?” “How long have you been [feeling like that]?” “I’m not sure I understand you fully.” Although Eliza is hardly convincing to a determined interlocutor, it shows a kind of strategy that future chatbots emulate more effectively: using humor and other deflection strategies to create a charming conversationalist, good at distracting the interlocutor from assessing whether its conversation partner really knows what’s going on. What is required, Levesque claims, is a more direct test of whether and what the computer understands. For this, he argues that “Winograd schemas” can be used to test an interlocutor’s ability to use background knowledge to disambiguate anaphoric pronoun reference in cases in which common background knowledge clearly and decisively solves the puzzle.
在她那篇内容广泛、发人深省的文章(第 10 章)中,梅兰妮·米切尔 (Melanie Mitchell) 认为,我们还没有对智能有一个清晰的概念。她声称,这种缺乏指导思想至少是造成人工智能历史上出现热情(人工智能之春)和失望(人工智能之冬)周期的部分原因。她指出了四个基本的“谬误”,她认为这些谬误让人工智能研究人员产生了一种人为的感觉,认为他们正在朝着通用人工智能的目标前进——也就是说,朝着或多或少能捕捉到典型的有能力的成年人如何运作的人工智能前进。所有这些谬误中的一个统一主题是,狭隘的、特定领域的解决问题能力(例如,人脸识别)与广泛的、领域通用的知识(似乎毫不费力地进入人类大脑)之间存在差异。
In her broad and provocative essay (chapter 10), Melanie Mitchell argues that we do not yet have a clear conception of what intelligence is. She claims that that this lack of a guiding idea is at least partially to blame for the cycles of enthusiasm (AI springs) and disappointment (AI winters) that characterize the history of AI. She identifies four fundamental “fallacies” that, she argues, give AI researchers an artificial sense of progress toward the goal of generalized AI—that is, toward AI that more or less captures how typical competent adults function. One unifying theme in all these fallacies is a difference between the kind of narrow, domain-specific problem-solving capacity that produces so much excitement (e.g., face recognition) and the kind of broad, domain-general knowledge that seems to come so effortlessly to the human mind.
领域特定能力和领域通用能力之间的这种关系正是福多尔经典的垂直(狭义、领域特定、信息封装的处理器)和水平(广义、综合、信息多孔)系统之间的区别。福多尔的核心论点(第 9 章)是,领域特定处理系统只能让我们在一定程度上理解思维。我们认可信念的能力(如科学信念的发展)、我们使用语言描述领域特定感官输入的能力,以及以有目的的、目标驱动的方式整合感知和行动的能力,似乎都需要获得无法事先指定的广泛、通用的信息。福多尔在此基础上认为,一定存在非模块化系统(即信息封装不狭义的系统),它们可以整合来自不同模块化系统的输入,以实现人类典型的信念形成和代理能力。
This relationship between domain-specific and domain-general capacities is precisely what is at stake in Fodor’s classic distinction between vertical (narrow, domain-specific, informationally encapsulated processers) and horizontal (broad, integrative, informationally porous) systems. Fodor’s central thesis (chapter 9) is that domain-specific processing systems can take us only so far in our understanding of the mind. Our ability to endorse beliefs (as in the development of scientific beliefs), our ability to use language to describe domain-specific sensory inputs, and the integration of perception and action in purposive, goal-driven ways all seem to require access to broad, general information that one cannot specify in advance. Fodor argues, on this basis, that there must be non-modular systems (i.e., systems that are not narrowly informationally encapsulated) that can integrate the input from different modular systems to achieve the kind of belief-formation and agential capacities typical of humans.
描述福多尔所追求的那种中枢处理的一种常见方式是使用“理性”一词:人们可能会认为,信念固着和有目的的行为最终可以用一种理性反应来描述,因此,是对正确或不正确推理规则的响应。罗素的文章以人工智能的核心目标是制造理性的机器作为其基本假设。但他问道,理性是什么?它是福多尔所想的那种理性吗?此外,我们希望我们的人工智能机器有多理性?它们的理性必须是完美的(和斯波克一样),还是仅仅达到人类(即明显次优的)理性水平就足够了?此外,理性必须仅体现在行为上,还是内部推理过程也必须模仿人类的推理思维?通过探索这些问题的轮廓,罗素指出了计算系统以及思维的一些明显局限性。例如,时间限制很重要。即使是典型智能的人类也无法考虑所有可能的信息。但这些机器必须工作多快?它们必须有多理性?在努力打造理性的机器时,什么样的理性才构成了适当的目标?在此过程中,罗素探索了人工智能问题的“难度”程度,并提出了有限理性理论,他认为,这足以作为该领域的合理目标。
One common way to characterize the kind of central processing that Fodor is after is in terms of “rationality”: belief fixation and purposive action are, one might think, definitively characterized in terms of a kind of reason-responsiveness and, as such, a responsiveness to the rules of correct or incorrect inference. Russell’s article takes as its fundamental assumption that a central goal of AI is to build machines that are rational. But, he asks, what is rationality? Is it the kind of rationality Fodor has in mind? Furthermore, how rational do we expect our AI machines to be? Must their rationality be perfect (and Spock-like), or is it sufficient merely to achieve human (i.e., demonstrably sub-optimal) levels of rationality? Additionally, must the rationality be evident in behavior alone, or must the internal reasoning process also ape human ratiocinative thought? By exploring the contours of these questions, Russell notes some of the obvious limitations on computational systems, as well as on minds. Time constraints matter, for example. Even paradigmatically intelligent humans can’t consider all possible information. But how fast must these machines work? And how rational must they be? And what is the sense of rationality that constitutes the appropriate target in the effort to build suitably rational machines? In the process, Russell explores degrees of “hardness” in AI problems and offers a theory of bounded rationality that, he argues, suffices as a reasonable target for the field.
图灵测试。 图灵测试一直是人工智能讨论中备受争议和关注的话题。希望进一步了解的人可以考虑阅读以下文章。
The Turing Test. The Turing test has been both controversial and central to discussions in AI. Those looking to read further might consider the following essays.
Winograd 模式。 那些希望跟进 Levesque 的建议(即 Winograd 模式提供了比图灵测试更优秀的测试)的人可以考虑阅读以下最新文章:
Winograd Schemas. Those wishing to follow up on Levesque’s suggestion that Winograd schemas provide a superior test over the Turing test might consider reviewing the following recent articles:
框架问题。 你打开一瓶根汁啤酒。打开后,你让苏打水可以饮用,你减少了瓶内的压力,你还弯曲了瓶盖。但大多数其他事物保持不变:火星的轨道、海盗游戏的结果,以及 Joe's Shanghai 汤圆的奇特美味。尽管我们假设单个小变化不会改变世界上的其他一切,但事实证明,这一原则很难形式化。然而,形式化这种概念(知道什么不会因我们的行动而改变)似乎是构建理性代理所必需的。Fodor 指出,这个问题出现在他的中央处理器上:鉴于它在信息上是非封装的,任何感知或行动都可能与一个人的所有信念相关。这肯定是 Mitchell 在本部分(第 10 章)的文章中考虑到的封装常识知识的挑战的一部分。它也与 Woodward 和 Cowie 在第四部分中考虑的思维模块化论证有关,因为模块化系统的一个好处是它们不需要考虑新输入对所有信念的影响。考虑以下内容:
The Frame Problem. You open a bottle of root beer. In doing so, you make the soda accessible to drinking, you reduce the pressure in the bottle, and you bend a bottle cap. Yet most other things continue unchanged: the orbit of Mars, the outcome of the Pirates’ game, and the curious delicacy of soup dumplings at Joe’s Shanghai. Although we assume that single small changes do not change everything else in the world, that principle has proved remarkably difficult to formalize. Yet formalizing such a notion (knowing what does not change with our actions) seems to be required to build a rational agent. Fodor notes that this problem arises for his central processor: given that it is informationally un-encapsulated, any perception or action might be relevant to all of one’s beliefs. Surely this is part of the challenge of encapsulating commonsense knowledge that Mitchell has in mind in her essay in this part (chapter 10). It is also related to arguments for modularity of mind considered by Woodward and Cowie in part IV, as one benefit of modular systems is that they need not consider the implications of new input for all of one’s beliefs. Consider the following:
有限理性。要说出什么是无限理性 并非易事,尽管人们通常认为无限理性至少包括一个人的思想与逻辑和数学公理的一致性,以及一个人的决策和行动与预期效用理论的一致性。每个人都知道人类达不到这些标准;二十世纪心理学和决策理论的一些主要主题涉及对人类理性的局限性进行分类。其中一些界限与我们记忆或时间的限制有关。其他界限与我们进化的环境类型以及我们选择的特定技能有关。为了从这些信息中获取一些信息,人们可以考虑:
Bounded Rationality. It is no trivial matter to say what non-bounded rationality is, though it is often taken to involve at least conformity of one’s thoughts with the axioms of logic and mathematics, and conformity of one’s decisions and actions with expected utility theory. Everyone knows that humans fall short of these standards; some of the major themes in twentieth-century psychology and decision theory involve cataloging the ways in which human rationality is bounded. Some of these bounds have to do with limits on our memory or our time. Others have to do with the sorts of environments in which we evolved, and the specific skill sets for which they selected. For a few small sips of this firehose of information, one might consider:
AGI 和 UAI。 将当代机器学习与人类广泛而灵活的智能进行对比的一种方式是说后者是通用的或普遍的。这导致了两种名称略有不同的传统:对通用人工智能 (AGI)或通用人工智能 (UAI)的探索。两者之间的差异往往在于知识传统和技术重点;两者都关注如何构建真正的通用智能。考虑以下内容:
AGI and UAI. One way to contrast contemporary machine learning with the broad, flexible intelligence found in humans is to say that the latter is general or universal. This has given rise to two traditions with slightly different names: the search for Artificial General Intelligence (AGI) or for Universal Artificial Intelligence (UAI). The differences between the two tend to be matters of intellectual tradition and technological focus; both are concerned with ways in which truly general intelligence might be built. Consider the following:
艾伦·M·图灵
Alan M. Turing
1950
1950
我打算考虑“机器能思考吗?”这个问题。首先应该定义“机器”和“思考”的含义。这些定义可以尽可能反映这些词的正常用法,但这种态度是危险的。如果要通过研究“机器”和“思考”这两个词的常用用法来发现它们的含义,就很难不得出这样的结论:“机器能思考吗?”这个问题的含义和答案应该在盖洛普民意调查等统计调查中寻找。但这是荒谬的。我不会尝试这样的定义,而是用另一个与之密切相关、用相对明确的词语表达的问题来代替这个问题。
I propose to consider the question “Can machines think?” This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous. If the meaning of the words ‘machine’ and ‘think’ are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, “Can machines think?” is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.
问题的新形式可以用我们称之为“模仿游戏”的游戏来描述。游戏由三个人玩,一个男人(A)、一个女人(B)和一个询问者(C),询问者可以是任何性别。询问者待在一个与其他两个人分开的房间里。询问者的游戏目标是确定其他两个人中哪个是男人,哪个是女人。他通过标签 X 和 Y 知道他们,并在游戏结束时说“X 是 A 而 Y 是 B”或“X 是 B 而 Y 是 A”。询问者可以这样向 A 和 B 提问:
The new form of the problem can be described in terms of a game which we call the “imitation game”. It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either “X is A and Y is B” or “X is B and Y is A”. The interrogator is allowed to put questions to A and B thus:
C:X能告诉我他或她的头发长度吗?
C: Will X please tell me the length of his or her hair?
现在假设 X 实际上是 A,那么 A 必须回答。游戏中 A 的目标是试图让 C 做出错误的判断。因此他的答案可能是:
Now suppose X is actually A, then A must answer. It is A’s object in the game to try to cause C to make the wrong identification. His answer might therefore be:
答:我的头发是瓦状的,最长的头发大约有九英寸长。
A: My hair is shingled, and the longest strands are about nine inches long.
为了避免语调对审讯者造成影响,答案应该用书面形式,或者最好用打字机打字。理想的安排是使用电传打字机在两个房间之间进行通信。或者,也可以由中间人重复问题和答案。第三个玩家(B)的游戏目标是帮助审讯者。对她来说,最好的策略可能是给出真实的答案。她可以在答案中添加诸如“我是女人,别听他的!”之类的话,但这毫无用处,因为男人也可以做出类似的评论。
In order that tones of voice may not help the interrogator the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprinter communicating between the two rooms. Alternatively the question and answers can be repeated by an intermediary. The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as “I am the woman, don’t listen to him!” to her answers, but it will avail nothing as the man can make similar remarks.
我们现在要问的是:“如果一台机器在这场游戏中扮演 A 的角色,会发生什么?”当游戏以这种方式进行时,询问者做出错误决定的几率会不会像在男人和女人之间进行游戏时一样高?这些问题取代了我们最初的问题:“机器能思考吗?”
We now ask the question, “What will happen when a machine takes the part of A in this game?” Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, “Can machines think?”
除了问“这个新形式的问题的答案是什么?”,我们还可以问“这个新问题值得研究吗?”我们不再多说,直接研究后一个问题,从而缩短了无限倒退的时间。
As well as asking, “What is the answer to this new form of the question?” one may ask, “Is this new question a worthy one to investigate?” This latter question we investigate without further ado, thereby cutting short an infinite regress.
新问题的优点在于,它相当清晰地区分了人类的体能和智力。没有工程师或化学家声称能够生产出与人类皮肤无法区分的材料。也许有一天可以做到这一点,但即使假设这项发明可用,我们也应该觉得,试图通过给“思考机器”穿上这种人造皮肤来使其更像人类是没有意义的。我们设置问题的形式反映了这一事实,即询问者无法看到或触摸其他竞争对手,也无法听到他们的声音。建议标准的其他一些优点可以通过样本问题和答案来显示出来。因此:
The new problem has the advantage of drawing a fairly sharp line between the physical and the intellectual capacities of a man. No engineer or chemist claims to be able to produce a material which is indistinguishable from the human skin. It is possible that at some time this might be done, but even supposing this invention available we should feel there was little point in trying to make a “thinking machine” more human by dressing it up in such artificial flesh. The form in which we have set the problem reflects this fact in the condition which prevents the interrogator from seeing or touching the other competitors, or hearing their voices. Some other advantages of the proposed criterion may be shown up by specimen questions and answers. Thus:
问:请给我写一首以福斯桥为主题的十四行诗。
Q: Please write me a sonnet on the subject of the Forth Bridge.
答:这个别指望我。我从来都不会写诗。
A: Count me out on this one. I never could write poetry.
问:将 34957 加到 70764 上。
Q: Add 34957 to 70764.
答:(停顿约30秒然后回答)105621。
A: (Pause about 30 seconds and then give as answer) 105621.
问:你下棋吗?
Q: Do you play chess?
答:是的。
A: Yes.
问:我的 Kl 上有 K,没有其他棋子。您的 K6 上只有 K,Rl 上只有 R。该您走哪一步了?您走哪一步?
Q: I have K at my Kl, and no other pieces. You have only K at K6 and R at Rl. It is your move. What do you play?
A:(暂停 15 秒后)R–R8 配对。
A: (After a pause of 15 seconds) R–R8 mate.
问答法似乎适用于介绍我们希望涵盖的几乎任何一个人类活动领域。我们不想因为机器在选美比赛中表现不佳而惩罚它,也不想因为人类在与飞机的比赛中失败而惩罚它。我们的游戏条件使这些缺陷变得无关紧要。如果“目击者”认为合适,他们可以随心所欲地吹嘘自己的魅力、力量或英雄气概,但询问者不能要求他们进行实际演示。
The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavor that we wish to include. We do not wish to penalize the machine for its inability to shine in beauty competitions, nor to penalize a man for losing in a race against an airplane. The conditions of our game make these disabilities irrelevant. The “witnesses” can brag, if they consider it advisable, as much as they please about their charms, strength or heroism, but the interrogator cannot demand practical demonstrations.
也许有人会批评这个游戏,认为机器获胜的几率太低了。如果人试图假装自己是机器,他的表现显然会很糟糕。他会因为算术缓慢和不准确而立刻暴露自己。机器难道不能做一些应该被描述为思考的事情,但这与人做的事情截然不同吗?这个反对意见非常强烈,但至少我们可以说,如果一台机器能够令人满意地玩模仿游戏,我们就不必为这个反对意见而烦恼。
The game may perhaps be criticized on the ground that the odds are weighted too heavily against the machine. If the man were to try and pretend to be the machine he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out something which ought to be described as thinking but which is very different from what a man does? This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection.
有人可能会说,在玩“模仿游戏”时,机器的最佳策略可能不是模仿人类的行为。这可能是真的,但我认为这种策略不太可能产生任何重大影响。无论如何,这里无意研究游戏理论,我们将假设最佳策略是尝试提供人类自然会给出的答案。
It might be urged that when playing the “imitation game” the best strategy for the machine may possibly be something other than imitation of the behavior of a man. This may be, but I think it is unlikely that there is any great effect of this kind. In any case there is no intention to investigate here the theory of the game, and it will be assumed that the best strategy is to try to provide answers that would naturally be given by a man.
在我们明确“机器”一词的含义之前,我们在第 6.1 节中提出的问题将不会十分明确。我们自然希望允许在我们的机器中使用每一种工程技术。我们还希望允许这样一种可能性,即工程师或工程师团队可以制造一台可以工作的机器,但其操作方式无法由其构造者令人满意地描述,因为他们采用的方法主要是实验性的。最后,我们希望将以通常方式出生的男性排除在机器之外。很难制定满足这三个条件的定义。例如,人们可能会坚持认为工程师团队应该全部是同一种性别,但这实际上并不令人满意,因为可能有可能从一个人皮肤的一个细胞(比如说)中培养出一个完整的个体。这样做将是一项值得最高赞扬的生物技术壮举,但我们不会倾向于将其视为“建造一台思考机器”的案例。这促使我们放弃允许使用每种技术的要求。鉴于目前人们对“思维机器”的兴趣是由一种特殊的机器引起的,这种机器通常被称为“电子计算机”或“数字计算机”,因此我们更愿意这样做。根据这一建议,我们只允许数字计算机参与我们的游戏。
The question which we put in Section 6.1 will not be quite definite until we have specified what we mean by the word ‘machine’. It is natural that we should wish to permit every kind of engineering technique to be used in our machines. We also wish to allow the possibility that an engineer or team of engineers may construct a machine which works, but whose manner of operation cannot be satisfactorily described by its constructors because they have applied a method which is largely experimental. Finally, we wish to exclude from the machines men born in the usual manner. It is difficult to frame the definitions so as to satisfy these three conditions. One might for instance insist that the team of engineers should be all of one sex, but this would not really be satisfactory, for it is probably possible to rear a complete individual from a single cell of the skin (say) of a man. To do so would be a feat of biological technique deserving of the very highest praise, but we would not be inclined to regard it as a case of “constructing a thinking machine”. This prompts us to abandon the requirement that every kind of technique should be permitted. We are the more ready to do so in view of the fact that the present interest in “thinking machines” has been aroused by a particular kind of machine, usually called an “electronic computer” or “digital computer”. Following this suggestion we only permit digital computers to take part in our game.
乍一看,这个限制似乎非常严格。我将尝试证明事实并非如此。要做到这一点,需要对这些计算机的性质和属性进行简要说明。
This restriction appears at first sight to be a very drastic one. I shall attempt to show that it is not so in reality. To do this necessitates a short account of the nature and properties of these computers.
也可以说,将机器等同于数字计算机的做法,就像我们的“思考”标准一样,只有在数字计算机无法在游戏中表现良好时(与我的看法相反)才会令人不满意。
It may also be said that this identification of machines with digital computers, like our criterion for “thinking”, will only be unsatisfactory if (contrary to my belief), it turns out that digital computers are unable to give a good showing in the game.
目前已经有许多数字计算机投入使用,人们可能会问:“为什么不立即进行实验呢?满足游戏条件很容易。可以使用许多询问器,并编制统计数据以显示正确识别的频率。”简而言之,我们不是在问所有数字计算机是否都能在游戏中表现出色,也不是问目前可用的计算机是否表现出色,而是问是否有可以想象的计算机会表现出色。但这只是简短的回答。我们稍后会从不同的角度看待这个问题。
There are already a number of digital computers in working order, and it may be asked, “Why not try the experiment straight away? It would be easy to satisfy the conditions of the game. A number of interrogators could be used, and statistics compiled to show how often the right identification was given.” The short answer is that we are not asking whether all digital computers would do well in the game nor whether the computers at present available would do well, but whether there are imaginable computers which would do well. But this is only the short answer. We shall see this question in a different light later.
数字计算机背后的理念可以这样解释:这些机器旨在执行人类计算机可以完成的任何操作。人类计算机应该遵循固定的规则;他无权在任何细节上偏离这些规则。我们可以假设这些规则写在一本书中,每当他被安排到新的工作时,这本书就会被修改。他还有无限量的纸张来做他的计算。他也可以在“台式机”上做乘法和加法,但这并不重要。
The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer. The human computer is supposed to be following fixed rules; he has no authority to deviate from them in any detail. We may suppose that these rules are supplied in a book, which is altered whenever he is put on to a new job. He has also an unlimited supply of paper on which he does his calculations. He may also do his multiplications and additions on a “desk machine”, but this is not important.
如果我们用上述解释作为定义,就会陷入循环论证的危险。我们可以通过概述实现预期效果的方法避免这种情况。数字计算机通常可视为由三部分组成:
If we use the above explanation as a definition, we shall be in danger of circularity of argument. We avoid this by giving an outline of the means by which the desired effect is achieved. A digital computer can usually be regarded as consisting of three parts:
(一)商店。
(i) Store.
(二)执行单位。
(ii) Executive unit.
(三)控制。
(iii) Control.
存储器是信息的存储器,对应于人类计算机的纸张,无论这是他进行计算的纸张还是打印规则书的纸张。只要人类计算机在头脑中进行计算,存储器的一部分将对应于他的记忆。
The store is a store of information, and corresponds to the human computer’s paper, whether this is the paper on which he does his calculations or that on which his book of rules is printed. Insofar as the human computer does calculations in his head, a part of the store will correspond to his memory.
执行单元是执行计算中涉及的各种单个操作的部分。这些单个操作因机器而异。通常可以执行相当长的运算,例如“将 3540675445 乘以 7076345687”,但在某些机器中只能执行非常简单的运算,例如“写下 0”。
The executive unit is the part which carries out the various individual operations involved in a calculation. What these individual operations are will vary from machine to machine. Usually fairly lengthy operations, such as “Multiply 3540675445 by 7076345687”, can be done, but in some machines only very simple ones, such as “Write down 0”, are possible.
我们提到过,计算机中提供的“规则手册”在机器中被存储器的一部分所取代。它被称为“指令表”。控制的职责是确保这些指令得到正确且按正确的顺序执行。控制的构造必然会发生这种情况。
We have mentioned that the “book of rules” supplied to the computer is replaced in the machine by a part of the store. It is then called the “table of instructions”. It is the duty of the control to see that these instructions are obeyed correctly and in the right order. The control is so constructed that this necessarily happens.
存储器中的信息通常被分成相当小的数据包。例如,在一台机器中,一个数据包可能由十个十进制数字组成。以某种系统方式将数字分配给存储器中存储各种信息包的部分。典型的指令可能是:
The information in the store is usually broken up into packets of moderately small size. In one machine, for instance, a packet might consist of ten decimal digits. Numbers are assigned to the parts of the store in which the various packets of information are stored, in some systematic manner. A typical instruction might say:
将位置 6809 中存储的数字与 4302 中的数字相加,并将结果放回到后者的存储位置。
Add the number stored in position 6809 to that in 4302 and put the result back into the latter storage position.
不用说,它不会出现在用英语表达的机器中。它更可能被编码成诸如 6809430217 的形式。这里 17 表示要对两个数字执行各种可能的操作中的哪一种——在这种情况下是上面描述的操作,即“添加数字…… ”。会注意到该指令占用 10 位数字,因此非常方便地形成了一个信息包。控制器通常会按照指令存储位置的顺序执行要执行的指令,但偶尔会出现这样的指令
Needless to say it would not occur in the machine expressed in English. It would more likely be coded in a form such as 6809430217. Here 17 says which of various possible operations is to be performed on the two numbers—in this case the operation that is described above, namely, “Add the number…”. It will be noticed that the instruction takes up 10 digits and so forms one packet of information, very conveniently. The control will normally take the instructions to be obeyed in the order of the positions in which they are stored, but occasionally an instruction such as
现在遵循存储在位置 5606 的指令,并从那里继续。
Now obey the instruction stored in position 5606, and continue from there.
可能会遇到,或者再次
may be encountered, or again
如果位置 4505 包含 0,则执行接下来存储在 6707 中的指令,否则继续直行。
If position 4505 contains 0 obey next the instruction stored in 6707, otherwise continue straight on.
后一种类型的指令非常重要,因为它们使得一系列操作可以一遍又一遍地重复,直到满足某个条件为止,但这样做时,要遵守的不是每次重复的新指令,而是一遍又一遍地重复相同的指令。举一个家庭的类比,假设母亲希望汤米每天早上在上学的路上去鞋匠那里看看她的鞋子是否修好了。她可以每天早上重新问他。或者,她可以一劳永逸地在走廊里贴一张通知,汤米上学时会看到这张通知,上面告诉他去取鞋子,如果他带着鞋子回来,还要把这张通知毁掉。
Instructions of these latter types are very important because they make it possible for a sequence of operations to be repeated over and over again until some condition is fulfilled, but in doing so to obey, not fresh instructions on each repetition, but the same ones over and over again. To take a domestic analogy, suppose Mother wants Tommy to call at the cobbler’s every morning on his way to school to see if her shoes are done. She can ask him afresh every morning. Alternatively she can stick up a notice once and for all in the hall which he will see when he leaves for school and which tells him to call for the shoes, and also to destroy the notice when he comes back if he has the shoes with him.
读者必须接受这样一个事实:根据我们所描述的原理,数字计算机是可以构建的,而且也确实已经构建好了,而且它们实际上可以非常接近地模仿人类计算机的行为。
The reader must accept it as a fact that digital computers can be constructed, and indeed have been constructed, according to the principles we have described, and that they can in fact mimic the actions of a human computer very closely.
我们描述的人类计算机所使用的规则书当然是一种方便的虚构。实际的人类计算机确实记得它们必须做什么。如果想让机器模仿人类计算机在某些复杂操作中的行为,就必须问它是如何完成的,然后将答案转换成指令表的形式。构建指令表通常被描述为“编程”。对机器进行编程以执行操作 A 意味着将适当的指令表放入机器中,以便它执行 A。
The book of rules which we have described our human computer as using is of course a convenient fiction. Actual human computers really remember what they have got to do. If one wants to make a machine mimic the behavior of the human computer in some complex operation one has to ask him how it is done, and then translate the answer into the form of an instruction table. Constructing instruction tables is usually described as “programming”. To “program a machine to carry out the operation A” means to put the appropriate instruction table into the machine so that it will do A.
数字计算机概念的一个有趣变体是带有随机元素的数字计算机。这些计算机具有涉及掷骰子或某种等效电子过程的指令;例如,其中一条指令可能是
An interesting variant on the idea of a digital computer is a digital computer with a random element. These have instructions involving the throwing of a die or some equivalent electronic process; one such instruction might for instance be
投掷骰子并将结果数字存入存储器 1000。
Throw the die and put the resulting number into store 1000.
有时,这样的机器被描述为具有自由意志(尽管我自己不会使用这个短语)。通常不可能通过观察机器来确定它是否具有随机元素,因为这种设备可以产生类似的效果,例如根据π的小数位数做出选择。
Sometimes such a machine is described as having free will (though I would not use this phrase myself). It is not normally possible to determine from observing a machine whether it has a random element, for a similar effect can be produced by such devices as making the choices depend on the digits of the decimal for π.
大多数实际数字计算机只有有限的存储空间。拥有无限存储空间的计算机的概念在理论上没有任何困难。当然,一次只能使用其中的有限部分。同样,只能构建有限数量的存储空间,但我们可以想象根据需要添加越来越多的存储空间。这样的计算机具有特殊的理论意义,被称为无限容量计算机。
Most actual digital computers have only a finite store. There is no theoretical difficulty in the idea of a computer with an unlimited store. Of course only a finite part of it can have been used at any one time. Likewise only a finite amount can have been constructed, but we can imagine more and more being added as required. Such computers have special theoretical interest and will be called infinite capacity computers.
数字计算机的概念由来已久。查尔斯·巴贝奇(Charles Babbage)曾于 1828 年至 1839 年担任剑桥大学卢卡斯数学教授,他曾设计过这样一种机器,称为“分析机”,但从未完成。尽管巴贝奇掌握了所有基本概念,但他的机器当时并不那么有吸引力。当时的速度肯定比人类计算机快,但比曼彻斯特机慢 100 倍左右,而曼彻斯特机本身是现代机器中速度较慢的机器之一。存储是纯机械的,使用轮子和卡片。
The idea of a digital computer is an old one. Charles Babbage, Lucasian Professor of Mathematics at Cambridge from 1828 to 1839, planned such a machine, called the “Analytical Engine”, but it was never completed. Although Babbage had all the essential ideas, his machine was not at that time such a very attractive prospect. The speed which would have been available would be definitely faster than a human computer but something like 100 times slower than the Manchester machine, itself one of the slower of the modern machines. The storage was to be purely mechanical, using wheels and cards.
巴贝奇的分析机完全是机械的,这一事实将帮助我们摆脱迷信。人们常常重视现代数字计算机是电子的,而神经系统也是电子的。由于巴贝奇的机器不是电子的,而且所有数字计算机在某种意义上都是等效的,因此,我们看到这种电的使用在理论上并不重要。当然,电通常用于快速信号传输,因此我们在这两种情况下都发现电也就不足为奇了。在神经系统中,化学现象至少与电一样重要。在某些计算机中,存储系统主要是声学的。因此,使用电的特性被视为只是一种非常表面的相似性。如果我们想找到这种相似性,我们应该寻找函数的数学类比。
The fact that Babbage’s Analytical Engine was to be entirely mechanical will help us to rid ourselves of a superstition. Importance is often attached to the fact that modern digital computers are electrical, and that the nervous system also is electrical. Since Babbage’s machine was not electrical, and since all digital computers are in a sense equivalent, we see that this use of electricity cannot be of theoretical importance. Of course electricity usually comes in where fast signaling is concerned, so it is not surprising that we find it in both these connections. In the nervous system chemical phenomena are at least as important as electrical. In certain computers the storage system is mainly acoustic. The feature of using electricity is thus seen to be only a very superficial similarity. If we wish to find such similarities we should look rather for mathematical analogies of function.
上一节中讨论的数字计算机可以归类为“离散状态机”。这些机器通过突然跳跃或点击从一个非常明确的状态移动到另一个状态。这些状态足够不同,以至于可以忽略它们之间混淆的可能性。严格地说,没有这样的机器。一切都在连续移动。但是有许多类型的机器可以有利地被认为是离散状态机。例如,在考虑照明系统的开关时,每个开关必须明确打开或明确关闭是一种方便的虚构。必须有中间位置,但对于大多数目的,我们可以忘记它们。作为离散状态机的一个例子,我们可以考虑一个每秒点击 120 °的轮子,但可以通过可以从外部操作的杠杆将其停止;此外,灯将在轮子的某个位置点亮。该机器可以抽象地描述如下:机器的内部状态(由轮子的位置描述)可能是q 1、q 2或q 3。有一个输入信号i 0或i 1(杠杆位置)。任何时刻的内部状态由最后一个状态和输入信号根据表格确定
The digital computers considered in the last section may be classified among the “discrete state machines”. These are the machines which move by sudden jumps or clicks from one quite definite state to another. These states are sufficiently different for the possibility of confusion between them to be ignored. Strictly speaking there are no such machines. Everything really moves continuously. But there are many kinds of machines which can profitably be thought of as being discrete state machines. For instance in considering the switches for a lighting system it is a convenient fiction that each switch must be definitely on or definitely off. There must be intermediate positions, but for most purposes we can forget about them. As an example of a discrete state machine, we might consider a wheel which clicks round through 120° once a second, but may be stopped by a lever which can be operated from outside; in addition a lamp is to light in one of the positions of the wheel. This machine could be described abstractly as follows: The internal state of the machine (which is described by the position of the wheel) may be q1, q2, or q3. There is an input signal i0 or i1 (position of lever). The internal state at any moment is determined by the last state and input signal according to the table
输出信号是内部状态(光)的唯一外部可见指示,如表所示
The output signals, the only externally visible indication of the internal state (the light), are described by the table
这个例子是典型的离散状态机。它们可以用这样的表格来描述,只要它们只有有限个可能的状态。
This example is typical of discrete state machines. They can be described by such tables, provided they have only a finite number of possible states.
似乎只要给定机器的初始状态和输入信号,就总是可以预测所有未来状态。这让人想起拉普拉斯的观点,即从某一时刻宇宙的完整状态(由所有粒子的位置和速度描述)出发,应该可以预测所有未来状态。然而,我们正在考虑的预测比拉普拉斯考虑的预测更接近实用性。“整个宇宙”的系统是这样的,初始条件中非常小的误差可能会在以后产生压倒性的影响。单个电子在某一时刻位移十亿分之一厘米可能会导致一个人一年后被雪崩杀死或逃脱。我们称之为“离散状态机”的机械系统的一个基本特性是这种现象不会发生。即使我们考虑实际的物理机器而不是理想化的机器,对某一时刻状态的合理准确了解也会在以后的任何步骤中产生合理准确的了解。
It will seem that given the initial state of the machine and the input signals it is always possible to predict all future states. This is reminiscent of Laplace’s view that from the complete state of the universe at one moment of time, as described by the positions and velocities of all particles, it should be possible to predict all future states. The prediction which we are considering is, however, rather nearer to practicability than that considered by Laplace. The system of the “universe as a whole” is such that quite small errors in the initial conditions can have an overwhelming effect at a later time. The displacement of a single electron by a billionth of a centimeter at one moment might make the difference between a man being killed by an avalanche a year later, or escaping. It is an essential property of the mechanical systems which we have called “discrete state machines” that this phenomenon does not occur. Even when we consider the actual physical machines instead of the idealized machines, reasonably accurate knowledge of the state at one moment yields reasonably accurate knowledge any number of steps later.
我们已经提到过,数字计算机属于离散状态机。但是这种机器能够达到的状态数通常非常大。例如,现在在曼彻斯特工作的机器的状态数约为 2 165,000 —即约 10 50,000。将其与上面描述的具有三个状态的打孔轮示例进行比较。不难看出为什么状态数应该如此巨大。计算机包括一个与人类计算机使用的纸张相对应的存储器。必须能够将可能写在纸上的任意一种符号组合写入存储器。为简单起见,假设仅使用从 0 到 9 的数字作为符号。忽略笔迹的变化。假设允许计算机100 张纸,每张纸包含 50 行,每行可容纳 30 位数字。那么状态数就是 10 100×50×30 — 也就是 10 150,000。这大约是三台曼彻斯特机加在一起的状态数。状态数以 2 为底的对数通常称为机器的“存储容量”。因此,曼彻斯特机的存储容量大约为 165,000,而我们示例中的轮机的存储容量大约为 1.6。如果将两台机器放在一起,必须将它们的容量相加才能得到合成机器的容量。这就导致可能出现这样的陈述:“曼彻斯特机包含 64 个磁道,每个磁道的容量为 2560,8 个电子管,容量为 1280。杂项存储空间大约为 300,总计 174,380。”
As we have mentioned, digital computers fall within the class of discrete state machines. But the number of states of which such a machine is capable is usually enormously large. For instance, the number for the machine now working at Manchester is about 2165, 000—that is, about 1050, 000. Compare this with our example of the dicking wheel described above, which had three states. It is not difficult to see why the number of states should be so immense. The computer includes a store corresponding to the paper used by a human computer. It must be possible to write into the store any one of the combinations of symbols which might have been written on the paper. For simplicity suppose that only digits from 0 to 9 are used as symbols. Variations in handwriting are ignored. Suppose the computer is allowed 100 sheets of paper each containing 50 lines each with room for 30 digits. Then the number of states is 10100×50×30—that is, 10150, 000. This is about the number of states of three Manchester machines put together. The logarithm to the base two of the number of states is usually called the “storage capacity” of the machine. Thus the Manchester machine has a storage capacity of about 165,000 and the wheel machine of our example about 1.6. If two machines are put together their capacities must be added to obtain the capacity of the resultant machine. This leads to the possibility of statements such as “The Manchester machine contains 64 magnetic tracks each with a capacity of 2560, eight electronic tubes with a capacity of 1280. Miscellaneous storage amounts to about 300 making a total of 174,380.”
给定与离散状态机相对应的表,就可以预测它将做什么。没有理由不通过数字计算机进行这种计算。只要能够足够快地完成,数字计算机就可以模仿任何离散状态机的行为。然后可以用所讨论的机器(作为 B)和模仿的数字计算机(作为 A)玩模仿游戏,询问者将无法区分它们。当然,数字计算机必须具有足够的存储容量,并且工作速度足够快。此外,它必须为想要模仿的每台新机器重新编程。
Given the table corresponding to a discrete state machine, it is possible to predict what it will do. There is no reason why this calculation should not be carried out by means of a digital computer. Provided it could be carried out sufficiently quickly the digital computer could mimic the behavior of any discrete state machine. The imitation game could then be played with the machine in question (as B) and the mimicking digital computer (as A) and the interrogator would be unable to distinguish them. Of course the digital computer must have adequate storage capacity a well as working sufficiently fast. Moreover, it must be programmed afresh for each new machine which it is desired to mimic.
数字计算机的这一特殊属性,即它们可以模拟任何离散状态机,被称为通用机器。具有这种属性的机器的存在具有重要的意义,即,除了速度之外,无需设计各种新机器来执行各种计算过程。它们都可以用一台数字计算机完成,并针对每种情况进行适当的编程。由此可见,所有数字计算机在某种意义上都是等价的。
This special property of digital computers, that they can mimic any discrete state machine, is described by saying that they are universal machines. The existence of machines with this property has the important consequence that, considerations of speed apart, it is unnecessary to design various new machines to do various computing processes. They can all be done with one digital computer, suitably programmed for each case. It will be seen that as a consequence of this all digital computers are in a sense equivalent.
现在我们可以再次考虑第 6.3 节末尾提出的观点。有人试探性地建议将“机器能思考吗?”这个问题替换为“是否存在可以想象的数字计算机能够在模仿游戏中表现良好?”如果我们愿意,我们可以将这个问题从表面上变得更具普遍性,并问:“是否存在表现良好的离散状态机?”但考虑到普遍性,我们发现这两个问题都等同于:“让我们将注意力集中在一台特定的数字计算机 C 上。通过修改这台计算机以使其具有足够的存储空间,适当提高其运行速度,并为其提供适当的程序,C 是否可以令人满意地扮演模仿游戏中 A 的角色,而 B 的角色则由人类扮演?”
We may now consider again the point raised at the end of Section 6.3. It was suggested tentatively that the question, “Can machines think?” should be replaced by “Are there imaginable digital computers which would do well in the imitation game?” If we wish we can make this superficially more general and ask, “Are there discrete state machines which would do well?” But in view of the universality property we see that either of these questions is equivalent to this: “Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate program, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?”
现在我们可以认为基础已经清理完毕,我们准备继续讨论我们的问题“机器能思考吗?”以及上一节末尾引用的变体。我们不能完全放弃问题的原始形式,因为对于替换的适当性会有不同意见,我们至少必须听听在这方面该说些什么。
We may now consider the ground to have been cleared and we are ready to proceed to the debate on our question, “Can machines think?” and the variant of it quoted at the end of the last section. We cannot altogether abandon the original form of the problem, for opinions will differ as to the appropriateness of the substitution and we must at least listen to what has to be said in this connection.
如果我先解释一下我自己对这个问题的看法,读者会更容易理解。首先考虑一下这个问题的更准确形式。我相信,大约五十年后,就有可能对存储容量约为 10 9 的计算机进行编程,使它们能够很好地玩模仿游戏,以至于普通询问者在五分钟的询问后,做出正确识别的几率不会超过 70%。最初的问题“机器能思考吗?”我认为毫无意义,不值得讨论。然而,我相信,到本世纪末,词语的使用和一般受过教育的观点将发生很大变化,人们将能够谈论机器思考,而不会期望遭到反驳。我进一步认为,隐瞒这些信念没有任何用处。普遍的观点认为,科学家们会不可避免地从已证实的事实走向已证实的事实,从不受任何未经证实的猜想的影响,这是完全错误的。只要明确哪些是已证实的事实,哪些是猜想,就不会造成任何伤害。推测非常重要,因为它们可以指明有用的研究方向。
It will simplify matters for the reader if I explain first my own beliefs in the matter. Consider first the more accurate form of the question. I believe that in about fifty years’time it will be possible to program computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. The original question, “Can machines think?” I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. I believe further that no useful purpose is served by concealing these beliefs. The popular view that scientists proceed inexorably from well-established fact to well-established fact, never being influenced by any unproved conjecture, is quite mistaken. Provided it is made clear which are proved facts and which are conjectures, no harm can result. Conjectures are of great importance since they suggest useful lines of research.
我现在开始考虑与我自己的观点相反的观点。
I now proceed to consider opinions opposed to my own.
(1)神学上的反对意见。思考是人类不朽灵魂的功能。上帝赋予每个男人和女人不朽的灵魂,但没有赋予其他动物或机器不朽的灵魂。因此,没有动物或机器能够思考。1
(1) THE THEOLOGICAL OBJECTION. Thinking is a function of man’s immortal soul. God has given an immortal soul to every man and woman, but not to any other animal or to machines. Hence no animal or machine can think.1
我无法接受其中的任何部分,但将尝试用神学术语来回答。如果将动物归类为人类,我会发现这个论点更有说服力,因为在我看来,典型的有生命和无生命之间的差异比人类和其他动物之间的差异更大。如果我们考虑一下其他宗教团体成员对正统观点的看法,就会更清楚地看出这种观点的任意性。基督徒如何看待穆斯林认为女性没有灵魂的观点?但让我们把这一点放在一边,回到主要论点上。在我看来,上面引用的论点意味着对全能者的全能的严重限制。人们承认,有些事情他不能做,比如把一等于二,但我们难道不应该相信,如果他认为合适,他有自由赋予大象灵魂吗?我们可以预料,他只会在与突变相结合的情况下行使这种权力,这种突变为大象提供了适当改进的大脑,以满足这个灵魂的需要。对于机器的情况,可以提出完全类似的论点。这看起来可能有所不同,因为它更难“接受”。但这实际上只意味着我们认为他不太可能认为这种情况适合赋予灵魂。本文的其余部分将讨论所讨论的情况。在尝试建造这样的机器时,我们不应该不敬地篡夺他创造灵魂的权力,就像我们在生育孩子方面一样:相反,无论哪种情况,我们都是他意志的工具,为他创造的灵魂提供住所。
I am unable to accept any part of this, but will attempt to reply in theological terms. I should find the argument more convincing if animals were classed with men, for there is a greater difference, to my mind, between the typical animate and the inanimate than there is between man and the other animals. The arbitrary character of the orthodox view becomes clearer if we consider how it might appear to a member of some other religious community. How do Christians regard the Moslem view that women have no souls? But let us leave this point aside and return to the main argument. It appears to me that the argument quoted above implies a serious restriction of the omnipotence of the Almighty. It is admitted that there are certain things that He cannot do such as making one equal to two, but should we not believe that He has freedom to confer a soul on an elephant if He sees fit? We might expect that He would only exercise this power in conjunction with a mutation which provided the elephant with an appropriately improved brain to minister to the needs of this soul. An argument of exactly similar form may be made for the case of machines. It may seem different because it is more difficult to “swallow”. But this really only means that we think it would be less likely that He would consider the circumstances suitable for conferring a soul. The circumstances in question are discussed in the rest of this paper. In attempting to construct such machines we should not be irreverently usurping His power of creating souls, any more than we are in the procreation of children: rather we are, in either case, instruments of His will providing mansions for the souls that He creates.
然而,这只是猜测。我对神学论证并不十分感兴趣,无论它们被用来支持什么。过去,这种论证常常被认为是不令人满意的。在伽利略时代,有人认为“太阳静止不动……整整一天不急于落下”(约书亚记 x. 13)和“他奠定了地球的基础,使它不会在任何时候移动”(诗篇 cv. 5)这些文字足以驳斥哥白尼理论。以我们目前的知识,这样的论点似乎是徒劳的。当没有这些知识时,它给人的印象就大不相同了。
However, this is mere speculation. I am not very impressed with theological arguments, whatever they may be used to support. Such arguments have often been found unsatisfactory in the past. In the time of Galileo it was argued that the texts, “And the sun stood still…and hasted not to go down about a whole day” (Joshua x. 13) and “He laid the foundations of the earth, that it should not move at any time” (Psalm cv. 5) were an adequate refutation of the Copernican theory. With our present knowledge, such an argument appears futile. When that knowledge was not available, it made a quite different impression.
(2)“把头埋在沙子里”的反对意见。“机器思考的后果太可怕了。我们希望并相信它们不会这样做。”
(2) THE “HEADS IN THE SAND” OBJECTION. “The consequences of machines thinking would be too dreadful. Let us hope and believe that they cannot do so.”
这种论点很少像上述形式那样公开表达。但它影响了我们大多数思考它的人。我们喜欢相信人类在某种程度上比其他生物更优越。如果能证明人类必然优越,那就最好了,因为这样他就不会失去他的统治地位。神学论点的流行显然与这种感觉有关。这种感觉在知识分子中可能相当强烈,因为他们比其他人更看重思考的能力,更倾向于将他们对人类优越性的信念建立在这种能力的基础上。
This argument is seldom expressed quite so openly as in the form above. But it affects most of us who think about it at all. We like to believe that Man is in some subtle way superior to the rest of creation. It is best if he can be shown to be necessarily superior, for then there is no danger of him losing his commanding position. The popularity of the theological argument is clearly connected with this feeling. It is likely to be quite strong in intellectual people, since they value the power of thinking more highly than others, and are more inclined to base their belief in the superiority of Man on this power.
我认为这个论点并不足以驳斥。安慰可能更合适;也许应该在灵魂的轮回中寻求安慰。
I do not think that this argument is sufficiently substantial to require refutation. Consolation would be more appropriate; perhaps this should be sought in the transmigration of souls.
(3)数学反对意见。有许多数学逻辑结果可用于表明离散状态机的能力存在限制。这些结果中最著名的是哥德尔定理 (1931),它表明在任何足够强大的逻辑系统中,都可以制定既不能在系统内证明也不能反驳的陈述,除非系统本身可能不一致。丘奇 (1936)、克莱尼 (1936)、罗瑟 (1936) 和图灵 (1937) 也提出了其他一些在某些方面类似的结果。后一个结果是最方便考虑的,因为它直接涉及机器,而其他结果只能用于相对间接的论证;例如,如果要使用哥德尔定理,我们还需要一些用机器描述逻辑系统的方法,以及用逻辑系统描述机器的方法。所讨论的结果指的是一种本质上是具有无限容量的数字计算机的机器。它指出,有些事情这种机器无法做到。如果像在模仿游戏中一样,它被设计成回答问题,那么对于某些问题,它要么给出错误答案,要么根本无法给出答案,无论回答的时间有多长。当然,可能有很多这样的问题,一台机器无法回答的问题,另一台机器可能会令人满意地回答。当然,我们目前假设这些问题是那种可以用“是”或“否”回答的问题,而不是诸如“你觉得毕加索怎么样?”这样的问题。我们知道机器必定会失败的问题是这种类型,“考虑如下指定的机器……这台机器是否会对任何问题回答‘是’?”这些点将被替换为标准形式的某些机器的描述,该描述可能类似于第 6.5 节中使用的内容。当所描述的机器与正在被询问的机器具有某种相对简单的关系时,可以证明答案是错误的或无法得到的。这是数学结果;有人认为,它证明了机器的缺陷,而人类智力却无法做到这一点。
(3) THE MATHEMATICAL OBJECTION. There are a number of results of mathematical logic which can be used to show that there are limitations to the powers of discrete state machines. The best known of these results is known as Gödel’s theorem (1931), and shows that in any sufficiently powerful logical system statements can be formulated which can neither be proved nor disproved within the system, unless possibly the system itself is inconsistent. There are other, in some respects similar, results due to Church (1936), Kleene (1936), Rosser (1936), and Turing (1937). The latter result is the most convenient to consider, since it refers directly to machines whereas the others can only be used in a comparatively indirect argument; for instance, if Godel’s theorem is to be used we need in addition to have some means of describing logical systems in terms of machines, and machines in terms of logical systems. The result in question refers to a type of machine which is essentially a digital computer with an infinite capacity. It states that there are certain things that such a machine cannot do. If it is rigged up to give answers to questions as in the imitation game, there will be some questions to which it will either give a wrong answer, or fail to give an answer at all, however much time is allowed for a reply. There may, of course, be many such questions, and questions which cannot be answered by one machine may be satisfactorily answered by another. We are of course supposing for the present that the questions are of the kind to which an answer “Yes” or “No” is appropriate, rather than questions such as “What do you think of Picasso?” The questions that we know the machines must fail on are of this type, “Consider the machine specified as follows…Will this machine ever answer ‘Yes’ to any question?” The dots are to be replaced by a description of some machine in a standard form, which could be something like that used in Section 6.5. When the machine described bears a certain comparatively simple relation to the machine which is under interrogation, it can be shown that the answer is either wrong or not forthcoming. This is the mathematical result; it is argued that it proves a disability of machines to which the human intellect is not subject.
对这一论点的简短回答是,尽管任何特定机器的能力都是有限的,但人们只是说,人类的智力不存在这种限制,而没有任何证据。但我认为不能如此轻易地否定这一观点。每当这些机器中的一台被问到适当的关键问题并给出明确的答案时,我们就知道这个答案一定是错的,这给了我们某种优越感。这种感觉是虚幻的吗?毫无疑问,这是非常真实的,但我认为不应该太过重视它。我们自己也经常对问题给出错误的答案,因此没有理由对机器的这种易错性感到高兴。此外,只有在这种场合,我们才能感受到我们的优越性,相对于我们取得小小胜利的那台机器。不可能同时战胜所有机器。简而言之,可能有人比任何给定的机器更聪明,但也可能还有其他机器更聪明,等等。
The short answer to this argument is that, although it is established that there are limitations to the powers of any particular machine, it has only been stated, without any sort of proof, that no such limitations apply to the human intellect. But I do not think this view can be dismissed quite so lightly. Whenever one of these machines is asked the appropriate critical question, and gives a definite answer, we know that this answer must be wrong, and this gives us a certain feeling of superiority. Is this feeling illusory? It is no doubt quite genuine, but I do not think too much importance should be attached to it. We too often give wrong answers to questions ourselves to be justified in being very pleased at such evidence of fallibility on the part of the machines. Further, our superiority can only be felt on such an occasion in relation to the one machine over which we have scored our petty triumph. There would be no question of triumphing simultaneously over all machines. In short, then, there might be men cleverer than any given machine, but then again there might be other machines cleverer again, and so on.
我认为,那些坚持数学论证的人大多愿意接受模仿游戏作为讨论的基础。那些相信前两种反对意见的人可能对任何标准都不感兴趣。
Those who hold to the mathematical argument would, I think, mostly be willing to accept the imitation game as a basis for discussion. Those who believe in the two previous objections would probably not be interested in any criteria.
(4)从意识出发的论证。这一论证在杰斐逊教授 1949 年的李斯特演说中得到了很好的表达,我从中引用了这段话。
(4) THE ARGUMENT FROM CONSCIOUSNESS. This argument is very well expressed in Professor Jefferson’s Lister Oration for 1949, from which I quote.
除非机器能够根据所感受到的思想和情感而不是偶然出现的符号来创作十四行诗或协奏曲,否则我们无法同意机器等同于大脑——也就是说,机器不仅能够创作,而且能够知道自己创作了这些作品。任何机器都无法感受到(而不仅仅是人工发出信号,这是一种简单的发明)成功的喜悦、阀门融合时的悲伤、奉承的温暖、错误带来的痛苦、性爱的魅力、无法得到想要的东西时的愤怒或沮丧。
Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain—that is, not only write it but know that it had written it. No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.
这种论证似乎否定了我们测试的有效性。根据这种观点的最极端形式,人们唯一能确定机器会思考的方式就是成为机器并感受到自己在思考。然后人们可以向世界描述这些感觉,但当然没有人有理由去注意。同样,根据这种观点,知道一个人会思考的唯一方式就是成为那个人。这实际上是唯我论的观点。这可能是最合乎逻辑的观点,但它使思想交流变得困难。A 倾向于相信“A 思考但 B 不思考”,而 B 则相信“B 思考但 A 不思考”。与其就这一点不断争论,不如通常采用每个人都思考的礼貌惯例。
This argument appears to be a denial of the validity of our test. According to the most extreme form of this view, the only way by which one could be sure that a machine thinks is to be the machine and to feel oneself thinking. One could then describe these feelings to the world, but of course no one would be justified in taking any notice. Likewise according to this view, the only way to know that a man thinks is to be that particular man. It is in fact the solipsist point of view. It may be the most logical view to hold but it makes communication of ideas difficult. A is liable to believe “A thinks but B does not” while B believes “B thinks but A does not”. Instead of arguing continually over this point, it is usual to have the polite convention that everyone thinks.
我确信杰斐逊教授不想采取极端和唯我论的观点。他可能很愿意接受模仿游戏作为一种测试。这种游戏(省略了玩家 B)在实践中经常以口头答辩的名义使用,以发现某人是否真正理解了某件事,还是“鹦鹉学舌”。让我们听听这种口头答辩的一部分:
I am sure that Professor Jefferson does not wish to adopt the extreme and solipsist point of view. Probably he would be quite willing to accept the imitation game as a test. The game (with the player B omitted) is frequently used in practice under the name of viva voce to discover whether someone really understands something or has “learned it parrot fashion”. Let us listen in to a part of such a viva voce:
审讯者:你的十四行诗的第一行是“我是否应将你比作夏日”,那么用“春日”来比喻不是同样合适或者更好吗?
INTERROGATOR: In the first line of your sonnet, which reads “Shall I compare thee to a summer’s day,” would not “a spring day” do as well or better?
证人:无法扫描。
WITNESS: It wouldn’t scan.
审讯员:那“冬天的一天”怎么样?这样就没问题了。
INTERROGATOR: How about “a winter’s day”. That would scan all right.
证人:是的,但是没有人愿意被拿来与冬日相比。
WITNESS: Yes, but nobody wants to be compared to a winter’s day.
审讯者:你会说匹克威克先生让你想起了圣诞节吗?
INTERROGATOR: Would you say Mr. Pickwick reminded you of Christmas?
证人:从某种程度上来说。
WITNESS: In a way.
询问者:但是圣诞节是冬日,我想匹克威克先生不会介意这样的比较。
INTERROGATOR: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would mind the comparison.
证人:我觉得你不是认真的。我们所说的冬日指的是一个普通的冬日,而不是像圣诞节这样的特殊日子。
WITNESS: I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than a special one like Christmas.
等等。如果十四行诗写作机器能够在口试中像这样回答,杰斐逊教授会怎么说?我不知道他是否会认为机器“只是人工发出信号”这些答案,但如果答案像上文一样令人满意和持久,我不认为他会把它描述为“一个简单的发明”。我认为,这句话意在涵盖这样的设备,例如在机器中包含某人阅读十四行诗的记录,并适当切换以不时打开它。
And so on. What would Professor Jefferson say if the sonnet-writing machine were able to answer like this in the viva voce? I do not know whether he would regard the machine as “merely artificially signaling” these answers, but if the answers were as satisfactory and sustained as in the above passage I do not think he would describe it as “an easy contrivance”. This phrase is, I think, intended to cover such devices as the inclusion in the machine of a record of someone reading a sonnet, with appropriate switching to turn it on from time to time.
简而言之,我认为大多数支持意识论证的人可以被说服放弃它,而不是被迫接受唯我论立场。然后他们可能愿意接受我们的测试。
In short then, I think that most of those who support the argument from consciousness could be persuaded to abandon it rather than be forced into the solipsist position. They will then probably be willing to accept our test.
我不想给人留下这样的印象:我认为意识没有任何神秘之处。例如,任何试图确定意识的尝试都会带来某种悖论。但我认为,在我们回答本文所关注的问题之前,不一定需要解决这些谜团。
I do not wish to give the impression that I think there is no mystery about consciousness. There is, for instance, something of a paradox connected with any attempt to localize it. But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper.
(5)各种缺陷的论据。这些论据的形式是:“我承认你可以让机器做你提到的所有事情,但你永远无法让机器做 X。”在这方面提出了许多 X 特征。我提供一些选择:
(5) ARGUMENTS FROM VARIOUS DISABILITIES. These arguments take the form, “I grant you that you can make machines do all the things you have mentioned but you will never be able to make one to do X.” Numerous features X are suggested in this connection. I offer a selection:
善良、足智多谋、美丽、友好、有主动性、有幽默感、明辨是非、犯错误、坠入爱河、享受草莓和奶油、让别人爱上它、从经验中学习、正确使用语言、成为自己思想的主体、拥有与男人一样多的行为多样性、做一些真正新的事情。
Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humor, tell right from wrong, make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from experience, use words properly, be the subject of its own thought, have as much diversity of behavior as a man, do something really new.
通常,这些说法没有得到任何支持。我相信它们大多建立在科学归纳原理的基础上。一个人一生中见过成千上万台机器。从他所看到的机器中,他得出了许多一般性结论。它们很丑陋,每台机器的设计目的都非常有限,当需要用于细微不同的目的时,它们毫无用处,其中任何一台机器的行为变化都非常小,等等。自然,他得出结论,这些是机器的必要属性。这些限制中的许多都与大多数机器的存储容量非常小有关。(我假设存储容量的概念以某种方式扩展到涵盖离散状态机以外的机器。确切的定义并不重要,因为目前的讨论并不要求数学准确性。)几年前,当人们对数字计算机知之甚少时,如果有人提到它们的特性而不描述它们的构造,就有可能引起人们对它们的怀疑。这大概是由于科学归纳原理的类似应用。这些原理的应用当然在很大程度上是无意识的。当一个被烧伤的孩子害怕火,并通过躲避火来表明他害怕火时,我应该说他在运用科学归纳法。(当然,我也可以用许多其他方式来描述他的行为。)人类的工作和习俗似乎不是应用科学归纳法的合适材料。如果要获得可靠的结果,必须调查很大一部分时空。否则,我们可能会(像大多数英国孩子一样)认定每个人都说英语,学习法语是愚蠢的。
No support is usually offered for these statements. I believe they are mostly founded on the principle of scientific induction. A man has seen thousands of machines in his lifetime. From what he sees of them he draws a number of general conclusions. They are ugly, each is designed for a very limited purpose, when required for a minutely different purpose they are useless, the variety of behavior of any one of them is very small, and so on and so forth. Naturally he concludes that these are necessary properties of machines in general. Many of these limitations are associated with the very small storage capacity of most machines. (I am assuming that the idea of storage capacity is extended in some way to cover machines other than discrete state machines. The exact definition does not matter as no mathematical accuracy is claimed in the present discussion.) A few years ago, when very little had been heard of digital computers, it was possible to elicit much incredulity concerning them, if one mentioned their properties without describing their construction. That was presumably due to a similar application of the principle of scientific induction. These applications of the principle are of course largely unconscious. When a burned child fears the fire and shows that he fears it by avoiding it, I should say that he was applying scientific induction. (I could of course also describe his behavior in many other ways.) The works and customs of mankind do not seem to be very suitable material to which to apply scientific induction. A very large part of space-time must be investigated if reliable results are to be obtained. Otherwise we may (as most English children do) decide that everybody speaks English, and that it is silly to learn French.
然而,对于已提到的许多缺陷,需要特别指出。无法享用草莓和奶油可能让读者觉得是轻浮的。也许可以制造一台机器来享用这道美味佳肴,但任何让机器这样做的尝试都是愚蠢的。这种缺陷的重要之处在于,它会导致其他一些缺陷,例如,导致人与机器之间难以像白人与白人之间或黑人与黑人之间那样产生友好关系。
There are, however, special remarks to be made about many of the disabilities that have been mentioned. The inability to enjoy strawberries and cream may have struck the reader as frivolous. Possibly a machine might be made to enjoy this delicious dish, but any attempt to make one do so would be idiotic. What is important about this disability is that it contributes to some of the other disabilities, for instance, to the difficulty of the same kind of friendliness occurring between man and machine as between white man and white man, or between black man and black man.
“机器不会犯错”的说法似乎很奇怪。人们忍不住要反驳:“难道它们就因此更糟糕吗?”但让我们采取一种更同情的态度,试着看看这到底是什么意思。我认为这种批评可以用模仿游戏来解释。据称,询问者只需给机器出一些算术题就能区分机器和人。机器会因为其致命的准确性而被揭穿。对此的回答很简单。机器(为玩游戏而编程)不会试图给出算术问题的正确答案。它会故意以一种精心设计的方式引入错误,以迷惑询问者。机械故障可能会通过对算术中应该犯什么样的错误的不适当决定而表现出来。即使是这种解释的批评并不够同情。但我们没有足够的篇幅来进一步探讨它。在我看来,这种批评取决于两种错误的混淆。我们可以称它们为“功能错误”和“结论错误”。功能错误是由于某些机械或电气故障导致机器的行为与设计时不同。在哲学讨论中,人们喜欢忽略这种错误的可能性;因此人们讨论的是“抽象机器”。这些抽象机器是数学虚构,而不是物理对象。根据定义,它们不会出现功能错误。从这个意义上说,我们可以真正地说“机器永远不会犯错误”。结论错误只有在机器的输出信号具有某种含义时才会出现。例如,机器可能会打出数学方程式或英文句子。当输入错误的命题时,我们说机器犯了结论错误。显然,没有任何理由说机器不会犯这种错误。它可能什么也不做,只是反复输入“0 = 1”。举一个不那么反常的例子,它可能有一些通过科学归纳得出结论的方法。我们必须预料到,这种方法偶尔会导致错误的结果。
The claim that “machines cannot make mistakes” seems a curious one. One is tempted to retort, “Are they any the worse for that?” But let us adopt a more sympathetic attitude, and try to see what is really meant. I think this criticism can be explained in terms of the imitation game. It is claimed that the interrogator could distinguish the machine from the man simply by setting them a number of problems in arithmetic. The machine would be unmasked because of its deadly accuracy. The reply to this is simple. The machine (programmed for playing the game) would not attempt to give the right answers to the arithmetic problems. It would deliberately introduce mistakes in a manner calculated to confuse the interrogator. A mechanical fault would probably show itself through an unsuitable decision as to what sort of a mistake to make in the arithmetic. Even this interpretation of the criticism is not sufficiently sympathetic. But we cannot afford the space to go into it much further. It seems to me that this criticism depends on a confusion between two kinds of mistakes. We may call them “errors of functioning” and “errors of conclusion”. Errors of functioning are due to some mechanical or electrical fault which causes the machine to behave otherwise than it was designed to do. In philosophical discussions one like to ignore the possibility of such errors; one is therefore discussing “abstract machines”. These abstract machines are mathematical fictions rather than physical objects. By definition they are incapable of errors of functioning. In this sense we can truly say that “machines can never make mistakes”. Errors of conclusion can only arise when some meaning is attached to the output signals from the machine. The machine might, for instance, type out mathematical equations, or sentences in English. When a false proposition is typed we say that the machine has committed an error of conclusion. There is clearly no reason at all for saying that a machine cannot make this kind of mistake. It might do nothing but type out repeatedly “0 = 1”. To take a less perverse example, it might have some method for drawing conclusions by scientific induction. We must expect such a method to lead occasionally to erroneous results.
当然,只有当可以证明机器对某些主题有某种思考时,才能回答“机器不能成为其自身思想的主体”这一说法。然而,“机器操作的主题”似乎确实意味着什么,至少对处理它的人来说是这样。例如,如果机器试图找到方程x 2 − 40 x − 11 = 0 的解,人们就会倾向于将这个方程描述为当时机器主题的一部分。它可能被用来帮助编写自己的程序,或者预测自身结构改变的影响。通过观察自身行为的结果,它可以修改自己的程序,以更有效地实现某些目的。这些都是不久的将来的可能性,而不是乌托邦式的梦想。
The claim that a machine cannot be the subject of its own thought can of course only be answered if it can be shown that the machine has some thought with some subject matter. Nevertheless, “the subject matter of a machine’s operations” does seem to mean something, at least to the people who deal with it. If, for instance, the machine were trying to find a solution of the equation x2 − 40x− 11 = 0, one would be tempted to describe this equation as part of the machine’s subject matter at that moment. It may be used to help in making up its own programs, or to predict the effect of alterations in its own structure. By observing the results of its own behavior it can modify its own programs so as to achieve some purpose more effectively. These are possibilities of the near future, rather than Utopian dreams.
有人批评机器不能有太多的行为多样性,其实就是说机器不能有太大的存储容量。直到最近,存储容量达到一千位数字的机器都十分罕见。
The criticism that a machine cannot have much diversity of behavior is just a way of saying that it cannot have much storage capacity. Until fairly recently a storage capacity of even a thousand digits was very rare.
我们在这里考虑的批评往往是意识论证的伪装形式。通常,如果有人坚持认为机器可以做这些事情之一,并描述了机器可以使用的方法,那么他不会给人留下太深刻的印象。人们认为这种方法(无论它是什么,因为它必须是机械的)实际上相当低劣。比较一下上面引用的杰斐逊声明中的括号。
The criticisms that we are considering here are often disguised forms of the argument from consciousness. Usually if one maintains that a machine can do one of these things, and describes the kind of method that the machine could use, one will not make much of an impression. It is thought that the method (whatever it may be, for it must be mechanical) is really rather base. Compare the parenthesis in Jefferson’s statement quoted above.
(6)洛夫莱斯夫人的反对意见。关于巴贝奇分析机的最详细信息来自洛夫莱斯夫人的回忆录(1842 年)。在回忆录中,她指出:“分析机不打算创造任何东西。只要我们知道命令它做什么,它就能做什么”(斜体字为她所加)。哈特里(1949 年)引用了这句话,并补充道:“这并不意味着不可能制造出‘为自己思考’的电子设备,或者从生物学角度来说,人们可以建立条件反射,作为‘学习’的基础。这在原则上是否可行是一个令人振奋和激动人心的问题,最近的一些发展提出了这个问题。但当时建造或设计的机器似乎不具备这种特性。”
(6) LADY LOVELACE’S OBJECTION. Our most detailed information of Babbage’s Analytical Engine comes from a memoir by Lady Lovelace (1842). In it she states, “The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform” (her italics). This statement is quoted by Hartree (1949) who adds: “This does not imply that it may not be possible to construct electronic equipment which will ‘think for itself’, or in which, in biological terms, one could set up a conditioned reflex, which would serve as a basis for ‘learning’. Whether this is possible in principle or not is a stimulating and exciting question, suggested by some of these recent developments. But it did not seem that the machines constructed or projected at the time had this property.”
我完全同意 Hartree 的观点。值得注意的是,他并没有断言所讨论的机器不具备这种属性,而是洛夫莱斯夫人掌握的证据并没有鼓励她相信它们具有这种属性。很有可能,所讨论的机器在某种意义上具有这种属性。假设某个离散状态机具有这种属性。分析机是一种通用数字计算机,因此,如果其存储容量和速度足够,它可以通过适当的编程来模仿所讨论的机器。伯爵夫人或巴贝奇可能没有想到这个论点。无论如何,他们没有义务声称所有可以声称的东西。
I am in thorough agreement with Hartree over this. It will be noticed that he does not assert that the machines in question had not got the property, but rather that the evidence available to Lady Lovelace did not encourage her to believe that they had it. It is quite possible that the machines in question had in a sense got this property. For suppose that some discrete state machine has the property. The Analytical Engine was a universal digital computer, so that, if its storage capacity and speed were adequate, it could by suitable programming be made to mimic the machine in question. Probably this argument did not occur to the Countess or to Babbage. In any case there was no obligation on them to claim all that could be claimed.
整个问题将在学习机器的标题下再次被考虑。
This whole question will be considered again under the heading of learning machines.
洛夫莱斯夫人的反对意见的一个变体是,机器“永远不能做任何真正新颖的事情”。这句话可以暂时用“太阳底下没有新鲜事”来回避。谁能确定他所做的“原创工作”不是通过教学在他身上种下的种子的生长,也不是遵循众所周知的一般原则的结果。反对意见的一个更好的变体是,机器永远不会“让我们措手不及”。这句话是一个更直接的挑战,可以直接应对。机器经常让我措手不及。这主要是因为我没有做足够的计算来决定期望它们做什么,或者更确切地说,因为虽然我做了计算,但我做得太匆忙、马虎了,冒了风险。也许我对自己说,“我想这里的电压应该和那里的电压一样:无论如何,让我们假设它是一样。”当然,我经常是错的,结果让我感到惊讶,因为到实验结束时,这些假设已经被遗忘了。这些承认使我容易受到有关我恶毒行径的说教,但当我证明我所经历的意外时,我的可信度却毫不怀疑。
A variant of Lady Lovelace’s objection states that a machine can “never do anything really new”. This may be parried for moment with the saw, “There is nothing new under the sun.” Who can be certain that “original work” that he has done was not simply the growth of the seed planted in him by teaching, or the effect of following well-known general principles. A better variant of the objection says that a machine can never “take us by surprise”. This statement is a more direct challenge and can be met directly. Machines take me by surprise with great frequency. This is largely because I do not do sufficient calculation to decide what to expect them to do, or rather because, although I do a calculation, I do it in a hurried, slipshod fashion, taking risks. Perhaps I say to myself, “I suppose the voltage here ought to be the same as there: anyway let’s assume it is.” Naturally I am often wrong, and the result is a surprise for me, for by the time the experiment is done these assumptions have been forgotten. These admissions lay me open to lectures on the subject of my vicious ways, but do not throw any doubt on my credibility when I testify to the surprises I experience.
我不指望这个答复能让批评者闭嘴。他可能会说,这种惊奇是由于我某种创造性的心理活动,并不代表机器的功劳。这又把我们带回到意识论证,远离惊奇的概念。我们必须认为这条论证已经结束,但也许值得一提的是,欣赏某种令人惊讶的东西需要同样多的“创造性心理活动”,无论令人惊讶的事件是来自人、书、机器还是其他任何东西。
I do not expect this reply to silence my critic. He will probably say that such surprises are due to some creative mental act on my part, and reflect no credit on the machine. This leads us back to the argument from consciousness, and far from the idea of surprise. It is a line of argument we must consider closed, but it is perhaps worth remarking that the appreciation of something as surprising requires as much of a “creative mental act” whether the surprising event originates from a man, a book, a machine or anything else.
我认为,认为机器无法制造意外的观点是哲学家和数学家特别容易犯的一个错误。这种假设认为,只要一个事实出现在大脑中,该事实的所有后果都会同时出现在大脑中。在许多情况下,这是一个非常有用的假设,但人们很容易忘记它是错误的。这样做的自然结果是,人们认为仅仅从数据和一般原则中得出后果是毫无意义的。
The view that machines cannot give rise to surprises is due, I believe, to a fallacy to which philosophers and mathematicians are particularly subject. This is the assumption that as soon as a fact is presented to a mind all consequences of that fact spring into the mind simultaneously with it. It is a very useful assumption under many circumstances, but one too easily forgets that it is false. A natural consequence of doing so is that one then assumes that there is no virtue in the mere working out of consequences from data and general principles.
(7)神经系统连续性的论证。神经系统当然不是离散状态机。神经脉冲大小信息中的一个小错误,可能会对输出脉冲的大小产生很大的影响。有人可能会说,既然如此,就不能指望用离散状态系统来模仿神经系统的行为。
(7) ARGUMENT FROM CONTINUITY IN THE NERVOUS SYSTEM. The nervous system is certainly not a discrete state machine. A small error in the information about the size of a nervous impulse impinging on a neuron, may make a large difference to the size of the outgoing impulse. It may be argued that, this being so, one cannot expect to be able to mimic the behavior of the nervous system with a discrete state system.
确实,离散状态机必须不同于连续状态机。但是,如果我们遵守模仿游戏的条件,询问者将无法利用这种差异。如果我们考虑其他更简单的连续机器,情况会更清楚。微分分析仪就很好用。(微分分析仪是一种机器,不是离散状态类型,用于某些类型的计算。)其中一些以打字形式提供答案,因此适合参与游戏。数字计算机不可能准确预测微分分析仪会对问题给出什么答案,但它完全有能力给出正确的答案。例如,如果要求给出π的值(实际上约为3.1416),则在3.12、3.13、3.14、3.15、3.16之间随机选择是合理的,概率为0.05、0.15、0.55、0.19、0.06(假设)。在这种情况下,询问者很难区分微分分析仪和数字计算机。
It is true that a discrete state machine must be different from a continuous machine. But if we adhere to the conditions of the imitation game, the interrogator will not be able to take any advantage of this difference. The situation can be made clearer if we consider some other simpler continuous machine. A differential analyzer will do very well. (A differential analyzer is a certain kind of machine, not of the discrete state type, used for some types of calculation.) Some of these provide their answers in a typed form, and so are suitable for taking part in the game. It would not be possible for a digital computer to predict exactly what answers the differential analyzer would give to a problem, but it would be quite capable of giving the right sort of answer. For instance, if asked to give the value of π (actually about 3.1416) it would be reasonable to choose at random between the values 3.12, 3.13, 3.14, 3.15, 3.16 with the probabilities of 0.05, 0.15, 0.55, 0.19, 0.06 (say). Under these circumstances it would be very difficult for the interrogator to distinguish the differential analyzer from the digital computer.
(8)行为非形式性论证。不可能制定一套规则来描述一个人在每一种可以想象到的情况下应该做什么。例如,人们可以制定一条规则,即看到红灯时要停车,看到绿灯时要通行;但如果由于某种错误,两种灯同时出现怎么办?人们也许会决定停车是最安全的。但这个决定以后可能会引发进一步的困难。试图提供涵盖所有可能发生的情况的行为规则,甚至包括由交通信号灯引起的情况,似乎是不可能的。我同意所有这些。
(8) THE ARGUMENT FROM INFORMALITY OF BEHAVIOR. It is not possible to produce a set of rules purporting to describe what a man should do in every conceivable set of circumstances. One might for instance have a rule that one is to stop when one sees a red traffic light, and to go if one sees a green one; but what if by some fault both appear together? One may perhaps decide that it is safest to stop. But some further difficulty may well arise from this decision later. To attempt to provide rules of conduct to cover every eventuality, even those arising from traffic lights, appears to be impossible. With all this I agree.
由此,有人认为我们不能成为机器。我将尝试重现这一论点,但恐怕很难公正地阐述它。它似乎是这样的:“如果每个人都有一套明确的行为准则来规范自己的生活,那么他就和机器没什么两样。但是没有这样的规则,所以人不能成为机器。”中间的未分配部分很明显。我认为这个论点从来没有像这样提出过,但我相信这是使用的论点。然而,“行为规则”和“行为法则”之间可能存在一定的混淆,使问题变得模糊。我所说的“行为规则”是指诸如“看到红灯就停车”之类的戒律,人们可以据此采取行动,并且可以意识到这一点。我所说的“行为法则”是指应用于人体的自然法则,例如“如果你掐他,他会吱吱叫”。如果我们将引用的论证中的“规范其生活的行为法则”替换为“规范其生活的行为法则”,未分配的中间部分就不再是不可克服的。因为我们相信,不仅受行为法则的约束意味着成为某种机器(尽管不一定是离散状态机),而且反过来,成为这样的机器意味着受此类法则的约束。然而,我们无法像相信没有完整的行为规则那样轻易地说服自己没有完整的行为法则。我们知道找到此类法则的唯一方法是科学观察,而且我们当然不知道在什么情况下我们可以说:“我们已经搜索得足够多了。没有这样的法则。”
From this it is argued that we cannot be machines. I shall try to reproduce the argument, but I fear I shall hardly do it justice. It seems to run something like this: “If each man had a definite set of rules of conduct by which he regulated his life he would be no better than a machine. But there are no such rules, so men cannot be machines.” The undistributed middle is glaring. I do not think the argument is ever put quite like this, but I believe this is the argument used nevertheless. There may however be a certain confusion between “rules of conduct” and “laws of behavior” to cloud the issue. By “rules of conduct” I mean precepts such as “Stop if you see red lights”, on which one can act, and of which one can be conscious. By “laws of behavior” I mean laws of nature as applied to a man’s body such as “if you pinch him he will squeak”. If we substitute “laws of behavior which regulate his life” for “laws of conduct by which he regulates his life” in the argument quoted the undistributed middle is no longer insuperable. For we believe that it is not only true that being regulated by laws of behavior implies being some sort of machine (though not necessarily a discrete state machine), but that conversely being such a machine implies being regulated by such laws. However, we cannot so easily convince ourselves of the absence of complete laws of behavior as of complete rules of conduct. The only way we know of for finding such laws is scientific observation, and we certainly know of no circumstances under which we could say: “We have searched enough. There are no such laws.”
我们可以更有力地证明,任何此类说法都是不合理的。假设我们肯定能找到这样的定律(如果它们存在的话)。那么,给定一个离散状态机,通过观察它,我们肯定能够发现足够的信息来预测它的未来行为,而且是在合理的时间内,比如说一千年。但事实似乎并非如此。我在曼彻斯特计算机上设置了一个只使用 1000 个存储单元的小程序,机器在两秒钟内用一个十六位数字回复另一个十六位数字。我敢说没有人能从这些回复中学到足够的信息来预测对未尝试过的值的任何回复。
We can demonstrate more forcibly that any such statement would be unjustified. For suppose we could be sure of finding such laws if they existed. Then given a discrete state machine it should certainly be possible to discover by observation sufficient about it to predict its future behavior, and this within a reasonable time, say a thousand years. But this does not seem to be the case. I have set up on the Manchester computer a small program using only 1000 units of storage, whereby the machine supplied with one sixteen figure number replies with another within two seconds. I would defy anyone to learn from these replies sufficient about the program to be able to predict any replies to untried values.
(9)超感官知觉论证。我假设读者熟悉超感官知觉的概念,以及它的四个项目的含义,即心灵感应、透视、预知和意志力。这些令人不安的现象似乎否定了我们所有的常规科学思想。我们多么想诋毁它们!不幸的是,至少对于心灵感应来说,统计证据是压倒性的。很难重新整理一个人的想法以适应这些新事实。一旦接受了它们,相信鬼魂和妖怪似乎并不是一个很大的步骤。我们的身体只是按照已知的物理定律以及其他一些尚未发现但有些相似的定律运动的想法将是最先被抛弃的想法之一。
(9) THE ARGUMENT FROM EXTRA-SENSORY PERCEPTION. I assume that the reader is familiar with the idea of extra-sensory perception, and the meaning of the four items of it, namely, telepathy, clairvoyance, precognition and psychokinesis. These disturbing phenomena seem to deny all our usual scientific ideas. How we should like to discredit them! Unfortunately the statistical evidence, at least for telepathy, is overwhelming. It is very difficult to rearrange one’s ideas so as to fit these new facts in. Once one has accepted them it does not seem a very big step to believe in ghosts and bogies. The idea that our bodies move simply according to the known laws of physics, together with some others not yet discovered but somewhat similar, would be one of the first to go.
在我看来,这个论点相当有力。人们可以反驳说,许多科学理论似乎在实践中仍然可行,尽管与 ESP 相冲突;事实上,如果人们忘记了它,人们可以过得很好。这相当令人安慰,人们担心思考正是 ESP 可能特别相关的那种现象。
This argument is to my mind quite a strong one. One can say in reply that many scientific theories seem to remain workable in practice, in spite of clashing with E.S.P.; that in fact one can get along very nicely if one forgets about it. This is rather cold comfort, and one fears that thinking is just the kind of phenomenon where E.S.P. may be especially relevant.
基于 ESP 的更具体论点可能是这样的:“让我们玩模仿游戏,使用一个擅长心灵感应接收的人和一台数字计算机作为证人。询问者可以问这样的问题:‘我右手的牌属于什么花色?’这个人通过心灵感应或千里眼在 400 张牌中给出了 130 次正确答案。机器只能随机猜测,可能猜对 104 次,因此询问者做出了正确的识别。” 这里有一个有趣的可能性。假设数字计算机包含一个随机数生成器。那么使用它来决定给出什么答案将是自然而然的。但随机数生成器将受制于询问者的心灵感应能力。也许这种心灵感应可能会导致机器比概率计算预期的更频繁地猜对,因此询问者可能仍然无法做出正确的识别。另一方面,他可能能够通过千里眼在没有任何疑问的情况下猜对。有了 ESP,一切事情都可能发生。
A more specific argument based on E.S.P. might run as follows: “Let us play the imitation game, using as witnesses a man who is good as a telepathic receiver, and a digital computer. The interrogator can ask such questions as ‘What suit does the card in my right hand belong to?’The man by telepathy or clairvoyance gives the right answer 130 times out of 400 cards. The machine can only guess at random, and perhaps get 104 right, so the interrogator makes the right identification.” There is an interesting possibility which opens here. Suppose the digital computer contains a random number generator. Then it will be natural to use this to decide what answer to give. But then the random number generator will be subject to the psychokinetic powers of the interrogator. Perhaps this psychokinesis might cause the machine to guess right more often then would be expected on a probability calculation, so that the interrogator might still be unable to make the right identification. On the other hand, he might be able to guess right without any questioning, by clairvoyance. With E.S.P. anything may happen.
如果心灵感应被认可,就需要加强我们的测试。这种情况可以被视为类似于审讯者自言自语,而其中一名参赛者正把耳朵贴在墙上听的情况。将参赛者放入“防心灵感应室”将满足所有要求。
If telepathy is admitted it will be necessary to tighten our test. The situation could be regarded as analogous to that which would occur if the interrogator were talking to himself and one of the competitors was listening with his ear to the wall. To put the competitors into a “telepathy-proof room” would satisfy all requirements.
读者应该已经预料到,我没有非常令人信服的正面论据来支持我的观点。如果我有,我就不会费尽心思去指出相反观点的谬误。我现在将给出我所掌握的证据。
The reader will have anticipated that I have no very convincing arguments of a positive nature to support my views. If I had I should not have taken such pains to point out the fallacies in contrary views. Such evidence as I have I shall now give.
让我们回到洛夫莱斯夫人的反对意见,她指出机器只能按照我们的指令行事。有人可能会说,一个人可以“注入”一个想法到机器中,机器会做出一定程度的反应,然后进入静止状态,就像被锤子敲击的钢琴弦一样。另一个比喻是小于临界尺寸的原子堆:注入的想法相当于从外部进入堆中的中子。每个这样的中子都会造成一定的扰动,最终会消失。然而,如果堆的大小足够大,这种进入中子引起的扰动很可能会持续下去,直到整个堆被摧毁。心灵有相应的现象吗?机器有相应的现象吗?人类心灵似乎确实有相应的现象。它们中的大多数似乎是“亚临界的”,也就是说,在这个类比中对应于亚临界尺寸的堆。向这样的心灵提出一个想法,平均而言,它会产生不到一个想法作为回应。一小部分是超临界的。向这样的头脑提出一个想法可能会产生一整套“理论”,其中包括二级、三级和更远的思想。动物的头脑似乎非常肯定是亚临界的。坚持这个类比,我们问,“机器能被制造成超临界的吗?”
Let us return for a moment to Lady Lovelace’s objection, which stated that the machine can only do what we tell it to do. One could say that a man can “inject” an idea into the machine, and that it will respond to a certain extent and then drop into quiescence, like a piano string struck by a hammer. Another simile would be an atomic pile of less than critical size: an injected idea is to correspond to a neutron entering the pile from without. Each such neutron will cause a certain disturbance which eventually dies away. If, however, the size of the pile is sufficiently increased, the disturbance caused by such an incoming neutron will very likely go on and on, increasing until the whole pile is destroyed. Is there a corresponding phenomenon for minds, and is there one for machines? There does seem to be one for the human mind. The majority of them seem to be “subcritical”, that is, to correspond in this analogy to piles of subcritical size. An idea presented to such a mind will on an average give rise to less than one idea in reply. A smallish proportion are supercritical. An idea presented to such a mind may give rise to a whole “theory” consisting of secondary, tertiary and more remote ideas. Animals’ minds seem to be very definitely subcritical. Adhering to this analogy we ask, “Can a machine be made to be supercritical?”
“洋葱皮”的类比也很有帮助。在考虑心灵或大脑的功能时,我们发现某些操作可以用纯机械术语来解释。我们说这并不对应于真正的心灵:这是一种皮肤,如果我们要找到真正的心灵,就必须剥去它。但是,在剩下的部分中,我们发现了另一层需要剥去的皮,等等。以这种方式继续下去,我们是否会得到“真正的”心灵,还是最终会得到里面什么都没有的皮肤?在后一种情况下,整个心灵都是机械的。(然而,它不会是一台离散状态机。我们已经讨论过这个问题。)
The “skin of an onion” analogy is also helpful. In considering the functions of the mind or the brain we find certain operations which we can explain in purely mechanical terms. This we say does not correspond to the real mind: it is a sort of skin which we must strip off if we are to find the real mind. But then in what remains we find a further skin to be stripped off, and so on. Proceeding in this way, do we ever come to the “real” mind, or do we eventually come to the skin which has nothing in it? In the latter case the whole mind is mechanical. (It would not be a discrete state machine however. We have discussed this.)
最后两段话并不算是令人信服的论据。它们应该被描述为“倾向于产生信念的背诵”。
These last two paragraphs do not claim to be convincing arguments. They should rather be described as “recitations tending to produce belief”.
对于第 6.6 节开头表达的观点,唯一真正令人满意的支持是等待本世纪末,然后进行所描述的实验。但与此同时我们能说什么呢?如果实验要成功,现在应该采取什么步骤?
The only really satisfactory support that can be given for the view expressed at the beginning of Section 6.6 will be that provided by waiting for the end of the century and then doing the experiment described. But what can we say in the meantime? What steps should be taken now if the experiment is to be successful?
正如我所解释的,问题主要在于编程。工程学也必须取得进步,但这些进步似乎不足以满足要求。大脑存储容量的估计值从 10 10到 10 15 个二进制数字不等。我倾向于较低的值,并相信只有很小一部分用于高级思维。大部分可能用于保留视觉印象。如果令人满意地玩模仿游戏需要超过 10 9 个,我会感到惊讶,至少对一个盲人而言是这样。(注:《大英百科全书》第 11 版的容量为 2 × 10 9 。)即使采用目前的技术,10 7的存储容量也是非常可行的。可能根本不需要提高机器的运行速度。现代机器中可以被视为神经细胞类似物的部件的工作速度比后者快一千倍。这应该能提供一个“安全边际”,以弥补以多种方式引起的速度损失。那么我们的问题就是找出如何对这些机器进行编程来玩这个游戏。以我目前的工作速度,我每天可以编写大约一千个程序,因此,如果没有任何东西被扔进废纸篓,大约六十名工人在五十年内稳定工作就可以完成这项工作。似乎需要一些更快捷的方法。
As I have explained, the problem is mainly one of programming. Advances in engineering will have to made too, but it seems unlikely that these will not be adequate for the requirements. Estimates of the storage capacity of the brain vary from 1010 to 1015 binary digits. I incline to the lower values and believe that only a very small fraction is used for the higher types of thinking. Most of it is probably used for the retention of visual impressions. I should be surprised if more than 109 was required for satisfactory playing of the imitation game, at any rate against a blind man. (Note: The capacity of the Encyclopedia Britannica, eleventh edition, is 2 × 109.) A storage capacity of 107 would be a very practicable possibility even by present techniques. It is probably not necessary to increase the speed of operations of the machines at all. Parts of modern machines which can be regarded as analogues of nerve cells work about a thousand times faster than the latter. This should provide a “margin of safety” which could cover losses of speed arising in many ways. Our problem then is to find out how to program these machines to play the game. At my present rate of working I produce about a thousand digits of program a day, so that about sixty workers, working steadily through the fifty years might accomplish the job, if nothing went into the wastepaper basket. Some more expeditious method seems desirable.
在试图模仿成年人心智的过程中,我们必然会认真思考导致其处于当前状态的过程。我们可能会注意到三个组成部分:
In the process of trying to imitate an adult human mind we are bound to think a good deal about the process which has brought it to the state that it is in. We may notice three components:
(a) 心灵的初始状态,比如出生时;
(a) The initial state of the mind, say at birth;
(b) 该人所受的教育;以及
(b) The education to which it has been subjected; and
(c) 其所经历的其他不属于教育的经历。
(c) Other experience, not to be described as education, to which it has been subjected.
与其尝试编写一个程序来模拟成人思维,为什么不尝试编写一个程序来模拟儿童的思维呢?如果对这个程序进行适当的教育,人们就会获得成人的大脑。据推测,儿童的大脑就像从文具店买来的笔记本一样。几乎没有什么机械装置,还有很多空白页。(从我们的角度来看,机械装置和写作几乎是同义词。)我们希望儿童大脑中的机械装置很少,这样就可以很容易地对类似的东西进行编程。我们可以假设,作为初步估计,教育工作量与人类儿童的工作量大致相同。
Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain. Presumably the child-brain is something like a notebook as one buys it from the stationers. Rather little mechanism, and lots of blank sheets. (Mechanism and writing are from our point of view almost synonymous.) Our hope is that there is so little mechanism in the child-brain that something like it can be easily programmed. The amount of work in the education we can assume, as a first approximation, to be much the same as for the human child.
因此,我们将问题分为两个部分——儿童程序和教育过程。这两个部分仍然紧密相连。我们不能指望第一次尝试就能找到一个好的儿童机器。我们必须尝试教这样的机器,然后看看它学习得有多好。然后人们可以尝试另一种,看看它是更好还是更坏。这个过程和进化之间有着明显的联系,通过识别
We have thus divided our problem into two parts—the child-program and the education process. These two remain very closely connected. We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications
儿童机器的结构 Structure of the child-machine |
= = |
遗传物质 Hereditary material |
|---|---|---|
儿童机器的变化 Changes of the child-machine |
= = |
突变 Mutations |
实验者的判断 Judgment of the experimenter |
= = |
自然选择 Natural selection |
然而,人们可能希望这个过程比进化更快。适者生存是一种衡量优势的缓慢方法。实验者通过运用智慧,应该能够加快这一速度。同样重要的是,他不受随机突变的限制。如果他能找出某些弱点的原因,他可能就能想出那种能改善它的突变。
One may hope, however, that this process will be more expeditious than evolution. The survival of the fittest is a slow method for measuring advantages. The experimenter, by the exercise of intelligence, should be able to speed it up. Equally important is the fact that he is not restricted to random mutations. If he can trace a cause for some weakness he can probably think of the kind of mutation which will improve it.
不可能将与对正常儿童完全相同的教学过程应用于机器。例如,它不会有腿,所以不能要求它出去装满煤斗。也许它没有眼睛。但无论这些缺陷如何通过巧妙的工程设计克服,都无法让这个生物上学而不让其他孩子过分取笑它。必须给它一些指导。我们不必太在意腿、眼睛等等。海伦·凯勒小姐的例子表明,只要师生之间可以通过某种方式进行双向交流,教育就可以进行。
It will not be possible to apply exactly the same teaching process to the machine as to a normal child. It will not, for instance, be provided with legs, so that it could not be asked to go out and fill the coal scuttle. Possibly it might not have eyes. But however well these deficiencies might be overcome by clever engineering, one could not send the creature to school without the other children making excessive fun of it. It must be given some tuition. We need not be too concerned about the legs, eyes, and so on. The example of Miss Helen Keller shows that education can take place provided that communication in both directions between teacher and pupil can take place by some means or other.
我们通常将惩罚和奖励与教学过程联系起来。一些简单的儿童机器可以根据这种原则构建或编程。机器必须如此构造,使得在惩罚信号出现之前不久发生的事件不太可能重复发生,而奖励信号则增加了导致惩罚信号的事件重复发生的概率。这些定义并不预设机器的任何感觉。我用一台这样的儿童机器做了一些实验,并成功地教会了它一些东西,但教学方法太不正统,实验不能被认为是真正成功的。
We normally associate punishments and rewards with the teaching process. Some simple child-machines can be constructed or programmed on this sort of principle. The machine has to be so constructed that events which shortly preceded the occurrence of a punishment-signal are unlikely to be repeated, whereas a reward-signal increases the probability of repetition of the events which led up to it. These definitions do not presuppose any feelings on the part of the machine. I have done some experiments with one such child-machine, and succeeded in teaching it a few things, but the teaching method was too unorthodox for the experiment to be considered really successful.
惩罚和奖励的使用最多只能成为教学过程的一部分。粗略地说,如果老师没有其他与学生交流的方式,那么他能获得的信息量不会超过所施加的奖励和惩罚的总数。当一个孩子学会背诵《卡萨布兰卡》时,如果只能通过“二十个问题”技术来发现文本,他可能会感到非常痛苦,每个“不”都以打击的形式出现。因此,有必要有其他一些“不带感情”的沟通渠道。如果有这些渠道,就可以通过惩罚和奖励来教机器服从用某种语言(例如符号语言)给出的命令。这些命令将通过“不带感情”的渠道传输。使用这种语言将大大减少所需的惩罚和奖励的数量。
The use of punishments and rewards can at best be a part of the teaching process. Roughly speaking, if the teacher has no other means of communicating to the pupil, the amount of information which can reach him does not exceed the total number of rewards and punishments applied. By the time a child has learned to repeat “Casablanca” he would probably feel very sore indeed, if the text could only be discovered by a “Twenty Questions” technique, every “No” taking the form of a blow. It is necessary therefore to have some other “unemotional” channels of communication. If these are available it is possible to teach a machine by punishments and rewards to obey orders given in some language, such as a symbolic language. These orders are to be transmitted through the “unemotional” channels. The use of this language will diminish greatly the number of punishments and rewards required.
关于儿童机器适合的复杂度,人们的看法可能有所不同。人们可能会尝试使其尽可能简单,并符合一般原则。或者,人们可能会“内置”一个完整的逻辑推理系统。2在后一种情况下,存储库将主要包含定义和命题。命题将具有各种状态,例如已确定的事实、猜想、数学证明的定理、权威给出的陈述以及具有命题逻辑形式但没有信念值的表达式。某些命题可能被描述为“命令”。机器的构造应使得一旦命令被归类为“已确定”,就会自动执行相应的操作。为了说明这一点,假设老师对机器说:“现在做你的作业。”这可能会导致“老师说‘现在做你的作业’”被包括在已确定的事实中。另一个这样的事实可能是,“老师说的一切都是正确的”。将这些结合起来最终可能导致命令“现在做作业”被纳入既定事实,而根据机器的构造,这意味着作业实际上已经开始了;但效果并不令人满意。机器使用的推理过程不必满足最严格的逻辑学家的要求。例如,可能没有类型的层次结构。但这并不意味着会发生类型谬误,就像我们注定会从没有围栏的悬崖上掉下来一样。适当的命令(在系统内表达,不构成系统规则的一部分)例如“不要使用某个类,除非它是老师提到的类的子类”可以产生与“不要太靠近边缘”类似的效果。
Opinions may vary as to the complexity which is suitable in the child-machine. One might try to make it as simple as possible consistently with the general principles. Alternatively one might have a complete system of logical inference “built in”.2 In the latter case the store would be largely occupied with definitions and propositions. The propositions would have various kinds of status, such as well-established facts, conjectures, mathematically proved theorems, statements given by an authority, and expressions having the logical form of a proposition but no belief-value. Certain propositions may be described as “imperatives”. The machine should be so constructed that as soon as an imperative is classed as “well-established” the appropriate action automatically takes place. To illustrate this, suppose the teacher says to the machine, “Do your homework now.” This may cause “Teacher says ‘Do your homework now’” to be included among the well-established facts. Another such fact might be, “Everything that teacher says is true”. Combining these may eventually lead to the imperative, “Do your homework now”, being included amongst the well-established facts, and this, by the construction of the machine, will mean that the homework actually gets started; but the effect is very unsatisfactory. The processes of inference used by the machine need not be such as would satisfy the most exacting logicians. There might, for instance, be no hierarchy of types. But this need not mean that type fallacies will occur, any more than we are bound to fall over unfenced cliffs. Suitable imperatives (expressed within the systems, not forming part of the rules of the system) such as “Do not use a class unless it is a subclass of one which has been mentioned by teacher” can have a similar effect to “Do not go too near the edge.”
没有四肢的机器能够遵循的命令必定具有相当的智力特征,如上面给出的例子(做作业)。这些命令中最重要的是那些规定相关逻辑系统规则应用顺序的命令。因为在使用逻辑系统的每个阶段,都存在大量可选步骤,就遵守逻辑系统规则而言,允许应用其中任何步骤。这些选择决定了聪明的推理者和愚蠢的推理者之间的区别,而不是健全的推理者和谬误的推理者之间的区别。导致这种命令的命题可能是“当提到苏格拉底时,使用芭芭拉的三段论”或“如果一种方法被证明比另一种方法更快,不要使用较慢的方法”。其中一些可能是“权威给出的”,但其他一些可能是由机器本身产生的,比如通过科学归纳法。
The imperatives that can be obeyed by a machine that has no limbs are bound to be of a rather intellectual character, as in the example (doing homework) given above. Important among such imperatives will be ones which regulate the order in which the rules of the logical system concerned are to be applied. For at each stage when one is using a logical system, there is a very large number of alternative steps, any of which one is permitted to apply, so far as obedience to the rules of the logical system is concerned. These choices make the difference between a brilliant and a footling reasoner, not the difference between a sound and a fallacious one. Propositions leading to imperatives of this kind might be “When Socrates is mentioned, use the syllogism in Barbara” or “If one method has been proved to be quicker than another, do not use the slower method.” Some of these may be “given by authority”, but others may be produced by the machine itself, say by scientific induction.
对于某些读者来说,学习机器的想法可能显得自相矛盾。机器的操作规则怎么会改变呢?它们应该完整地描述机器将如何反应,无论它的历史如何,无论它经历什么变化。因此,规则是相当不随时间变化的。这是完全正确的。对这一悖论的解释是,在学习过程中发生变化的规则是一种不那么自命不凡的规则,只声称具有短暂的有效性。读者可以将其与美国宪法进行比较。
The idea of a learning machine may appear paradoxical to some readers. How can the rules of operation of the machine change? They should describe completely how the machine will react whatever its history might be, whatever changes it might undergo. The rules are thus quite time-invariant. This is quite true. The explanation of the paradox is that the rules which get changed in the learning process are of a rather less pretentious kind, claiming only an ephemeral validity. The reader may draw a parallel with the Constitution of the United States.
学习型机器的一个重要特征是,它的老师通常对机器内部发生的事情一无所知,尽管他可能在某种程度上仍能预测学生的行为。这应该最适用于对经过精心设计(或程序)的儿童机器进行后期教育。这与使用机器进行计算时的正常程序形成鲜明对比:人们的目标是在计算的每个时刻对机器的状态有一个清晰的心理图像。这个目标只有经过努力才能实现。与此相比,“机器只能做我们知道如何命令它做的事情” 3的观点似乎很奇怪。我们可以放入机器中的大多数程序都会导致它做一些我们根本无法理解的事情,或者我们认为是完全随机的行为。智能行为可能与计算中涉及的完全有纪律的行为有所不同,但差异很小,不会引起随机行为或无意义的重复循环。通过教学和学习过程为我们的机器做好参与模仿游戏的准备,另一个重要结果是,“人类的易犯错误”很可能以一种相当自然的方式被承认,也就是说,无需特殊的“指导”。(读者应该将这一点与第 113 页的观点相协调。)学习的过程不会产生百分之百确定的结果;如果它们能产生百分之百确定的结果,那么它们就不可能被消除。
An important feature of a learning machine is that its teacher will often be very largely ignorant of quite what is going on inside, although he may still be able to some extent to predict his pupil’s behavior. This should apply most strongly to the later education of a machine arising from a child-machine of well-tried design (or program). This is in clear contrast with normal procedure when using a machine to do computations: one’s object is then to have a clear mental picture of the state of the machine at each moment in the computation. This object can only be achieved with a struggle. The view that “the machine can only do what we know how to order it to do”,3 appears strange in face of this. Most of the programs which we can put into the machine will result in its doing something that we cannot make sense of at all, or which we regard as completely random behavior. Intelligent behavior presumably consists in a departure from the completely disciplined behavior involved in computation, but a rather slight one, which does not give rise to random behavior, or to pointless repetitive loops. Another important result of preparing our machine for its part in the imitation game by a process of teaching and learning is that “human fallibility” is likely to be admitted in a rather natural way, that is, without special “coaching”. (The reader should reconcile this with the point of view on p. 113.) Processes that are learned do not produce a hundred percent certainty of result; if they did they could not be unlearned.
在学习机中加入随机元素可能是明智之举(见第 105 页)。当我们寻找某个问题的解决方案时,随机元素非常有用。例如,假设我们想找到一个介于 50 和 200 之间的数字,该数字等于其数字之和的平方,我们可以从 51 开始,然后尝试 52,一直到我们得到一个有效的数字。或者,我们可以随机选择数字,直到我们得到一个好的数字。这种方法的优点是不需要跟踪已经尝试过的值,但缺点是可能会尝试同一个值两次;但如果有多个解决方案,这并不重要。系统方法的缺点是,在必须首先调查的区域中可能会有一个巨大的没有任何解决方案的块。现在,学习过程可以看作是寻找一种能够满足老师(或其他标准)的行为形式。由于可能存在大量令人满意的解决方案,因此随机方法似乎比系统方法更好。应该注意的是,它被用于类似的进化过程。但在那里,系统方法是不可能的。如何才能跟踪已经尝试过的不同基因组合,以避免再次尝试它们?
It is probably wise to include a random element in a learning machine (see p. 105). A random element is rather useful when we are searching for a solution of some problem. Suppose for instance we wanted to find a number between 50 and 200 which was equal to the square of the sum of its digits, we might start at 51 then try 52 and go on until we got a number that worked. Alternatively we might choose numbers at random until we got a good one. This method has the advantage that it is unnecessary to keep track of the values that have been tried, but the disadvantage that one may try the same one twice; but this is not very important if there are several solutions. The systematic method has the disadvantage that there may be an enormous block without any solutions in the region which has to be investigated first. Now the learning process may be regarded as a search for a form of behavior which will satisfy the teacher (or some other criterion). Since there is probably a very large number of satisfactory solutions, the random method seems to be better than the systematic. It should be noticed that it is used in the analogous process of evolution. But there the systematic method is not possible. How could one keep track of the different genetical combinations that had been tried, so as to avoid trying them again?
我们可能希望机器最终在所有纯智力领域与人类竞争。但从哪个领域开始最好呢?即使是这个也很难决定。许多人认为,像下棋这样的非常抽象的活动是最好的。也可以坚持认为,最好为机器提供金钱可以买到的最好的感官,然后教它理解和说英语。这个过程可以遵循对孩子的正常教学。事物会被指出和命名,等等。同样,我不知道正确的答案是什么,但我认为两种方法都应该尝试一下。
We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best. It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. This process could follow the normal teaching of a child. Things would be pointed out and named, and so on. Again I do not know what the right answer is, but I think both approaches should be tried.
我们只能看到前方很短的距离,但我们可以看到有很多事情需要做。
We can only see a short distance ahead, but we can see plenty there that needs to be done.
1.这种观点可能有点异端邪说。圣托马斯·阿奎那(《神学大全》,罗素 1945 年著,第 458 页)说,上帝不能使人没有灵魂。但这可能并不是对他权力的真正限制,而只是因为人的灵魂是不朽的,因此是不可毁灭的。
1. Possibly this view is heretical. St. Thomas Aquinas (Summa Theologica, quoted in Russell, 1945, p. 458) states that God cannot make a man to have no soul. But this may not be a real restriction on His powers, but only a result of the fact that men’s souls are immortal, and therefore indestructible.
2.或者说,我们的儿童机器“被编程”将在数字计算机中被编程。但逻辑系统不需要学习。
2. Or rather “programmed in” for our child-machine will be programmed in a digital computer. But the logical system will not have to be learned.
3.比较洛夫莱斯夫人的陈述(第 114 页),其中没有包含“仅”一词。
3. Compare Lady Lovelace’s statement (p. 114), which does not contain the word “only”.
赫克托·J·莱维斯克
Hector J. Levesque
2014
2014
这篇论文是关于人工智能的科学。不幸的是,人工智能的技术吸引了所有人的注意力。公众可能会认为人工智能就是那些令人惊叹的应用,智能的和自主的。我们这个领域的人士都知道,对于许多应用来说,“智能”一词只不过是一个流行词(就像“红色美味苹果”中的“美味”一词一样)。除了考虑许多可能有益的人工智能应用外,我们经常对人工智能技术的潜在滥用(在武器等领域)抱有严重的疑虑。
This paper is about the science of AI. Unfortunately, it is the technology of AI that gets all the attention. The general public could be forgiven for thinking that AI is just about all those whiz-bang applications, smart this and autonomous that. Those of us in the field know that for many applications, the term “intelligent” is no more than a buzzword (like the term “delicious” in “red delicious apples”). And along with the many possibly beneficial AI applications under consideration, we often have serious misgivings about the potential misuse of AI technology (in areas like weaponry).
但人工智能不仅仅是一门技术。我们中的许多人并不是受到目前正在考虑的任何人工智能应用的激励,而是受到科学事业的激励,试图了解我们周围的世界。不同的科学有不同的主题,而人工智能是从计算角度研究智能行为的。还有什么比这更令人着迷的呢?人脑是一个了不起的东西,也许是我们所知的宇宙中最复杂的物体。但更了不起的是人脑所能做的事情。我们最好的智能行为远远超出了我们有权期待纯物理行为所能达到的范围。事实上,人工智能科学的首要问题是:物理的东西(比如人)如何可能真正做到X?
But AI is more than just technology. Many of us are motivated not by any of the AI applications currently being considered, but by the scientific enterprise, the attempt to understand the world around us. Different sciences have different subject matters, and AI is the study of intelligent behaviour in computational terms. What could be more fascinating? The human brain is a remarkable thing, perhaps the single most complex object we know of in the universe. But even more remarkable is what a human brain is capable of doing. Our intelligent behaviour at its best goes well beyond what we have any right to expect to emerge out of purely physical matter. Indeed, the overarching question for the science of AI is: How is it possible for something physical (like people, for instance) to actually do X?
其中X是智能行为的众多实例之一。这需要与一个相关问题进行对比:我们能否设计一个计算机系统来做一些模糊的X式的事情?
where X is one of the many instances of intelligent behaviour. This needs to be contrasted with a related question: Can we engineer a computer system to do something that is vaguely X-ish?
关于这一点我们后面还会有更多讨论。
about which we will have much more to say later.
请注意,人工智能科学研究的是智能行为,而不是谁或什么产生了这种行为。例如,它研究的是自然语言理解,而不是自然语言理解者。这就是人工智能与人类研究(神经科学、心理学、认知科学、进化生物学等)截然不同的地方。
Note that the science of AI studies intelligent behaviour, not who or what is producing the behaviour. It studies natural language understanding, for instance, not natural language understanders. This is what makes AI quite different from the study of people (in neuroscience, psychology, cognitive science, evolutionary biology, and so on).
我们关心的是哪种行为?不同的研究人员自然会关注不同的方面。行为可能依赖于感知或运动技能,也可能不依赖于感知或运动技能。行为可能包括学习,也可能不包括学习。行为可能基于情绪反应或社交互动,也可能不基于情绪反应或社交互动。一些研究人员主要关注的是各种动物的智能行为,比如在房间中找到所需物品的能力。而另一些人则只关注人类的行为,比如下棋的能力。(这两组人有时会发生方法论上的争论,前者认为,除非我们了解人类行为的基本形式,否则我们无法理解人类行为,而后者则认为科学根本不是这样运作的。在游戏的这个阶段,真的没有理由采取教条主义的立场。)
What sort of behaviour do we care about? Different researchers will quite naturally focus on different aspects. The behaviour may or may not depend on perceptual or motor skills. It may or may not include learning. It may or may not be grounded in emotional responses, or in social interactions. For some researchers, the main concern is intelligent behaviour seen in a variety of animals, like the ability to find a desired object in a room. For others, the focus is on behaviour seen in humans only, like the ability to play chess. (These two groups sometimes engage in methodological disputes, with the former arguing that we cannot expect to understand human behaviour until we understand its more basic forms, and the latter responding that this is not how science works at all. At this stage of the game, there is really no reason to take a doctrinaire position one way or another.)
在本文中,我打算研究一种基本的智能行为:回答用英语提出的某些临时问题。考虑以下问题:
In this paper, I intend to examine one basic form of intelligent behaviour: answering certain ad-hoc questions posed in English. Consider a question like the following:
鳄鱼能参加障碍赛吗?
Could a crocodile run a steeplechase?
即使你知道什么是鳄鱼和障碍赛马,你也从未真正考虑过这个问题,除非你碰巧读过我早期的一篇论文(Levesque,1988)。你也不能简单地在某处查找正确答案。然而,答案确实几乎立刻出现在你脑海中。以下是同一篇论文中的另一个问题:
Even if you know what crocodiles and steeplechases are,1 you have never really thought about this question before, unless you happened to have read an early paper of mine (Levesque, 1988). Nor can you simply look up the correct answer somewhere. And yet, an answer does occur to you almost immediately. Here is another question from the same paper:
是否应该允许棒球运动员将小翅膀粘到他们的帽子上?
Should baseball players be allowed to glue small wings onto their caps?
同样,你以前从未想过这一点,但你又一次想到了答案。(在这种情况下,你甚至可能想知道这个问题是否有你可能错过的技巧。没有。)
Again, you have never thought of this before, but again an answer occurs to you. (In this case, you might even wonder if there is some sort of trick to the question that you may have missed. There is none.)
在本文中,我想探讨一下我们回答这类一次性问题的能力,原因有四:
In this paper, I want to consider our ability to answer one-shot questions like these, and for four reasons:
1. 这是人类的明显表现。我们确实有能力回答这样的问题,不需要任何特殊的训练或指导。
1. This is behaviour that is clearly exhibited by people. We are indeed capable of answering questions like these without any special training or instructions.
2. 这种行为很难破解。我们至今还不清楚人们会如何回答这些问题。现有的计算机程序无法复制我们的能力。
2. This is behaviour that is difficult to crack. We have as yet no good idea about what people do to answer them. No existing computer program can duplicate our ability.
3. 我们回答此类问题的行为似乎是其他更复杂(且更具生态意义)的行为形式的基础。
3. Our behaviour in answering questions like these appears to underly other more complex (and more ecologically significant) forms of behaviour.
4. 即使在这种简单情况下,清楚而准确地说明我们关心的行为形式也将有助于阐明人工智能科学的成功意味着什么。
4. Being clear and precise about the form of behaviour we care about even in this simple case will also help clarify what it means for the science of AI to be successful.
然而,正如我们将看到的,我们有充分的理由去回答更为严格形式的问题。
As we will see, however, there will be good reasons to move to answering questions of a more restricted form.
给定某种形式的智能行为,我们如何知道人工智能研究人员讲述的计算故事实际上解释了该行为。答案可以追溯到图灵:如果计算解释能够生成无法与人类行为长期区分的行为,那么它就是足够的。
Given some form of intelligent behaviour, how do we know that the computational story told by AI researchers actually explains the behaviour. The answer, going all the way back to Turing, is this: a computational account is adequate if it is able to generate behaviour that cannot be distinguished over the long haul from the behaviour produced by people.
这当然让人想起著名的图灵测试(图灵,1950 年;本卷第 6 章)。我们设想一个询问者与两个参与者(一个人和一台计算机)通过电传打字机进行长时间的对话。对话自然、流畅,可以讨论任何话题。如果无论对话有多长,询问者都无法分辨出两个参与者中的哪一个是人,则计算机通过了图灵测试。
This, of course, harks back to the famous Turing Test (Turing, 1950; Chapter 6 of this volume). We imagine an extended conversation over a teletype between an interrogator and two participants, a person and a computer. The conversation is natural, free-flowing, and about any topic whatsoever. The computer is said to pass the Turing Test if no matter how long the conversation, the interrogator cannot tell which of the two participants is the person.
在我看来,图灵在这一切中的观点是:“智能”、“思考”、“理解”等术语太过模糊和情绪化,不值得争论。如果我们坚持在科学背景下使用它们,我们应该愿意说,一个能够通过适当行为测试的程序与人一样具有所讨论的属性。改编电影人物阿甘正传的名言“愚蠢就是愚蠢的行为”,我们可以想象图灵说“聪明就是聪明的行为”。在我看来,这是一个非常明智的立场,我曾在其他地方为它辩护过(Levesque,2009)。
Turing’s point in all this, it seems to me, is this: Terms like “intelligent,” “thinking,” “understanding,” and the like are much too vague and emotionally charged to be worth arguing about. If we insist on using them in a scientific context at all, we should be willing to say that a program that can pass a suitable behavioural test has the property in question as much as the person. Adapting the dictum of the movie character Forest Gump who said “Stupid is as stupid does,” we can imagine Turing saying “Intelligent is as intelligent does.” This is a very sensible position, it seems to me, and I have defended it elsewhere (Levesque, 2009).
然而,我确实认为图灵测试有一个严重的问题:它过于依赖欺骗。计算机程序通过测试的前提是它能欺骗询问者,使他认为自己面对的是人而不是计算机。考虑询问者问这样的问题:
However, I do feel that the Turing Test has a serious problem: it relies too much on deception. A computer program passes the test iff it can fool an interrogator into thinking she is dealing with a person not a computer. Consider the interrogator asking questions like these:
你有多高?
How tall are you?
或者
or
跟我讲讲你的父母吧。
Tell me about your parents.
为了通过测试,程序要么必须回避问题(并回避问题),要么必须制造某种虚假身份(并准备好令人信服地撒谎)。事实上,回避行为在一年一度的洛布纳竞赛(图灵测试的限制版本)中表现得非常明显。2 “聊天机器人”(竞赛中的计算机参赛者)严重依赖文字游戏、笑话、引用、旁白、情绪爆发、程序问题等。似乎除了对问题的明确和直接的回答之外,什么都做不了!
To pass the test, a program will either have to be evasive (and duck the question) or manufacture some sort of false identity (and be prepared to lie convincingly). In fact, evasiveness is seen quite clearly in the annual Loebner Competition, a restricted version of the Turing Test.2 The “chatterbots” (as the computer entrants in the competition are called) rely heavily on wordplay, jokes, quotations, asides, emotional outbursts, points of order, and so on. Everything, it would appear, except clear and direct answers to questions!
毫无疑问,愚弄他人的能力很有趣,但这并不是这里真正的问题所在。3我们可能会问:有没有比自由形式的对话更好的行为测试?
The ability to fool people is interesting, no doubt, but not really what is at issue here.3 We might well ask: is there a better behaviour test than having a free-form conversation?
有一些非常合理的非英语选项可供考虑,例如“captchas”(Von Ahn 等,2003 年)。但英语是一种极好的媒介,因为它允许我们广泛而灵活地讨论主题(并防止偏见:年龄、教育、文化等)。
There are some quite reasonable non-English options to consider, such as “captchas” (Von Ahn et al., 2003). But English is an excellent medium since it allows us to range over topics broadly and flexibly (and guard for biases: age, education, culture, etc.).
但还有另一种选择:如果审讯者不进行对话,而是只问一些多项选择题,会怎么样?这有一些明显的优势:
But here is another option: what if instead of a conversation, the interrogator only asks a number of multiple-choice questions? This has some distinct advantages:
我们希望人们能够轻松回答多项选择题。但我们也希望尽可能避免使用廉价技巧(即启发式)来回答的问题。
We want multiple-choice questions that people can answer easily. But we also want to avoid as much as possible questions that can be answered using cheap tricks (aka heuristics).
例如,考虑一下之前提出的问题:
Consider for example, the question posed earlier:
鳄鱼能参加障碍赛吗?
Could a crocodile run a steeplechase?
这里的意图很明确。仔细想想,这个问题就可以回答:鳄鱼的腿很短;障碍赛中的篱笆太高了,鳄鱼无法跳过;所以,鳄鱼不能参加障碍赛。
The intent here is clear. The question can be answered by thinking it through: a crocodile has short legs; the hedges in a steeplechase would be too tall for the crocodile to jump over; so no, a crocodile cannot run a steeplechase.
问题在于,还有另一种不需要这种理解水平的方法来回答这个问题。这个想法是使用封闭世界假设(Reiter,1978 年;Collins 等人,1975 年)。这个假设(除其他外)说:如果你找不到某事物存在的证据,就假设它不存在。
The trouble is that there is another way to answer the question that does not require this level of understanding. The idea is to use the closed world assumption (Reiter, 1978; Collins et al., 1975). This assumption says (among other things) the following: If you can find no evidence for the existence of something, assume that it does not exist.
对于上面的问题,由于我从未听说过鳄鱼能够跑障碍赛,所以我的结论是它不能。故事结束。请注意,这是一个廉价的把戏:它能得到正确的答案,但原因却令人生疑。例如,对于有关瞪羚的问题,它会给出错误的答案。尽管如此,如果我们关心的只是正确回答鳄鱼的问题,那么这个廉价的把戏就成功了。
For the question above, since I have never heard of a crocodile being able to run a steeplechase, I conclude that it cannot. End of story. Note that this is a cheap trick: it gets the answer right, but for dubious reasons. It would produce the wrong answer for a question about gazelles, for example. Nonetheless, if all we care about is answering the crocodile question correctly, then this cheap trick does the trick.
我们能否找到一些问题,在这些问题上,像这样的廉价技巧不足以产生所需的行为?不幸的是,这个问题没有简单的答案。我们能做的最好的事情,也许就是仔细想出一套多项选择题,然后研究可能能够回答这些问题的计算机程序类型。以下是一些明显的指导原则:
Can we find questions where cheap tricks like this will not be sufficient to produce the desired behaviour? This unfortunately has no easy answer. The best we can do, perhaps, is to come up with a suite of multiple-choice questions carefully and then study the sorts of computer programs that might be able to answer them. Here are some obvious guidelines:
目前,在这个方向上,一种有前途的方法是识别文本蕴涵挑战(Dagan 等人,2005;Bobrow 等人,2007)。但它也有自己的问题,因此我们在此提出了一种不同的方法。
One existing promising approach in this direction is the recognizing textual entailment challenge (Dagan et al., 2005; Bobrow et al., 2007). But it has problems of its own, and so here we propose a different one.
我们的方法可以通过一个示例问题来最好地说明:5
Our approach is best illustrated with an example question:5
琼特意感谢苏珊给予的所有帮助。谁给予了帮助?
Joan made sure to thank Susan for all the help she had given. Who had given the help?
Winograd 模式问题是一个二元选择题,具有以下属性:
A Winograd schema question is a binary-choice question with these properties:
在上面,使用的特殊词是“given”,另一个词是“received”。因此,每个 Winograd 模式实际上都会生成两个非常相似的问题:
In the above, the special word used is “given,” and the other word is “received.” So each Winograd schema actually generates two very similar questions:
琼向苏珊表示感谢,感谢她给予的所有帮助 。谁 给予 了帮助?
Joan made sure to thank Susan for all the help she had given. Who had given the help?
和
and
琼向苏珊表示感谢,感谢她所 得到的所有帮助。谁 得到 了帮助?
Joan made sure to thank Susan for all the help she had received. Who had received the help?
这两个问题之间只有一个词的差异,这有助于防止人们使用最低劣的伎俩来回答它们。
It is this one-word difference between the two questions that helps guard against using the cheapest of tricks on them.
这里还有一些其他的例子。第一个甚至适合小孩子:
Here are some additional examples. The first is one that is suitable even for young children:
奖杯太小了,装不进棕色的手提箱。什么那么小?
The trophy would not fit in the brown suitcase because it was so small. What was so small?
在这种情况下,使用的特殊词是“小”,另一个词是“大”。这是 Terry Winograd (1972a) 的原始示例,该模式以他的名字命名:
In this case, the special word used is “small” and the other word is “big.” Here is the original example due to Terry Winograd (1972a) for whom the schema is named:
市议员们拒绝向愤怒的示威者发放许可证,因为他们害怕暴力。谁害怕暴力?
The town councillors refused to give the angry demonstrators a permit because they feared violence. Who feared violence?
这里的特殊词是“害怕”,替代词是“提倡”。
Here the special word is “feared” and the alternative word is “advocated.”
只要稍加注意,就可以想出一些能锻炼不同专业知识的 Winograd 模式问题。以下是有关某些材料的示例:
With a bit of care, it is possible to come up with Winograd schema questions that exercise different kinds of expertise. Here is an example concerning certain materials:
大球直接撞破了桌子,因为桌子是用泡沫塑料做的。什么东西是用泡沫塑料做的?
The large ball crashed right through the table because it was made of styrofoam. What was made of styrofoam?
特殊词是“泡沫聚苯乙烯”,替代词是“钢”。这个测试解决问题的能力:
The special word is “styrofoam” and the alternative is “steel.” This one tests for problem-solving skill:
土豆袋放在面粉袋下面,所以必须先移动它。必须先移动什么?
The sack of potatoes had been placed below the bag of flour, so it had to be moved first. What had to be moved first?
特殊词是“below”,替代词是“above”。此示例测试可视化能力:
The special word is “below” and the alternative is “above.” This example tests for an ability to visualize:
山姆曾试图画一幅牧羊人和羊群的图画,但他们最终看起来更像高尔夫球手。什么看起来像高尔夫球手?
Sam tried to paint a picture of shepherds with sheep, but they ended up looking more like golfers. What looked like golfers?
使用的特殊词是“高尔夫球手”,另一个词是“狗”。
The special word used is “golfers” and the other is “dogs.”
当然,这种形式的问题并不适用于这里。可以构造太“简单”的问题,例如:
Of course not just any question in this form will do the job here. It is possible to construct questions that are too “easy,” like this one:
赛车因为速度太快而轻松超越了校车。是什么车的速度这么快?
The racecar easily passed the school bus because it was going so fast. What was going so fast?
问题是,这个问题可以用以下技巧来回答:忽略给定的句子,并检查哪两个词更频繁地同时出现(根据谷歌的说法):“racecar”与“fast”或“school bus”与“fast”。问题也可能太“难”,比如这个:
The problem is that this question can be answered using the following trick: ignore the given sentence, and check which two words co-occur more frequently (according to Google, say): “racecar” with “fast” or “school bus” with “fast.” Questions can also be too “hard,” like this one:
当比尔说他是比赛的获胜者时,弗兰克很嫉妒。获胜者是谁?
Frank was jealous when Bill said that he was the winner of the competition. Who was the winner?
问题是,当使用“快乐”变体时,这个问题就变得模棱两可了。弗兰克可能因为他是赢家而感到快乐,也可能因为比尔是赢家而感到快乐。关于这些问题和其他问题的进一步讨论可以在 Levesque 等人 (2012) 的文章中找到。
The problem is that this question is ambiguous when the “happy” variant is used. Frank could plausibly be happy because he is the winner or because Bill is. Further discussion on these and other issues can be found in Levesque et al. (2012).
现在可以制定图灵测试的替代方案。可以将一组预先测试过的 Winograd 模式隐藏在库中。6 Winograd模式测试涉及询问以下几个问题,对错误答案给予严厉惩罚(以阻止猜测)。测试可以以完全自动化的方式进行管理和评分:
It is now possible to formulate an alternative to the Turing Test. A collection of pre-tested Winograd schemas can be hidden in a library.6 A Winograd Schema Test involves asking a number of these questions with a strong penalty for wrong answers (to preclude guessing). A test can be administered and graded in a fully automated way:
1. 选择N(例如N =25)个合适的问题(就词汇量、专业知识等而言);
1. select N (e.g., N = 25) questions that are suitable (with respect to vocabulary, expertise, etc.);
2. 随机使用问题中的一个特殊词语;
2. randomly use one of the special words in the question;
3. 向受试者展示测试,并获取N 个二进制答复;
3. present the test to the subject, and obtain the N binary replies;
考试的最终成绩是
The final grade for the test is
其中k表示猜测的惩罚(例如,k = 5)。这里的主要主张是,英语能力正常的成年人将轻松通过测试。因此,如果我们想要产生与人类无法区分的行为,我们需要想出一个也可以通过测试的程序。
where k codes the penalty for guessing (e.g., k = 5). The main claim here is that normally-abled English-speaking adults will pass the test easily. So, if we want to produce behaviour that is indistinguishable from that of people, we will need to come up with a program that can also pass the test.
总结一下:关于图灵测试,我们同意图灵的观点,即实质性问题是计算机程序是否可以实现某种智能行为。但图灵所提倡的自由形式对话可能不是正式测试的最佳载体,因为它允许狡猾的受试者躲在嬉戏、语言技巧和预设回答的烟幕后面。我们的立场是,基于 Winograd 模式问题的替代测试不太容易被滥用,但显然比进行合作对话(例如,关于图灵想象的十四行诗)对智力的要求要低得多。
To summarize: With respect to the Turing Test, we agree with Turing that the substantive question is whether or not a certain intelligent behaviour can be achieved by a computer program. But a free-form conversation as advocated by Turing may not be the best vehicle for a formal test, as it allows a cagey subject to hide behind a smokescreen of playfulness, verbal tricks, and canned responses. Our position is that an alternative test based on Winograd schema questions is less subject to abuse, though clearly much less demanding intellectually than engaging in a cooperative conversation (about sonnets, for example, as imagined by Turing).
计算机程序需要什么才能通过 Winograd 模式测试?我的感觉是,我们可以在以下方面取得很大进展:
What would it take for a computer program to pass a Winograd schema test. My feeling is that we can go quite some distance with the following:
1. 以 Winograd 模式问题为例,例如
1. Take a Winograd schema question such as
奖杯太小了,装不进棕色的手提箱。什么那么小?
The trophy would not fit in the brown suitcase because it was so small. What was so small?
并将其解析成以下形式:
and parse it into the following form:
两方关系为R。
Two parties are in relation R.
其中一个具有属性P。哪一个?
One of them has property P. Which?
对于上述问题,可得出以下结论:
For the question above, this gives the following:
R = 不适合;P = 太小。
R = does not fit in; P = is so small.
2. 然后使用大数据:搜索网络上的所有英文文本,以确定哪个是更常见的模式:
2. Then use big data: search all the English text on the web to determine which is the more common pattern:
这种“大数据”方法是一个很好的技巧,但不幸的是,它仍然太廉价了。除其他外,它忽略了R和P之间的联系。考虑一下:
This “big data” approach is an excellent trick, but unfortunately, it is still too cheap. Among other things, it ignores the connective between R and P. Consider this:
尽管奖杯很小,但棕色的手提箱却放不下它。是什么让它这么小呢?
The trophy would not fit in the brown suitcase despite the fact that it was so small. What was so small?
请注意,这里的R和P与之前相同,尽管这次的答案一定不同。
Note that the R and P here would be the same as before, even though the answer must be different this time.
现在考虑以下示例:
Now consider the following example:
弗雷德是唯一一个还记得我父亲婴儿时期的人。弗雷德第一次见到我父亲时,他已经十二岁了。谁十二岁了?
Fred is the only man alive who still remembers my father as an infant. When Fred first saw my father, he was twelve years old. Who was twelve years old?
这里任何R和P之间的关系显然要复杂得多。
Here the relationship between any R and P is clearly much more complex.
那么,我们能从中得出什么结论呢?我们只是需要更多的技巧吗?
So what do we conclude from this? Do we simply need a bigger bag of tricks?
人工智能倾向于从纯统计意义上关注行为。我们想问:
There is a tendency in AI to focus on behaviour in a purely statistical sense. We ask:
我们能否设计一个系统来产生所需的行为,并且其错误不会比人们产生的错误更多(置信水平为z)?
Can we engineer a system to produce a desired behaviour with no more errors than people would produce (with confidence level z)?
通过这种方式看待行为可以允许一些出现的更具挑战性的例子(比如上面有关弗雷德的问题)在它们不具有统计意义时被简单地忽略。
Looking at behaviour this way can allow some of the more challenging examples that arise (like the question concerning Fred above) to simply be ignored when they are not statistically significant.
不幸的是,这可能导致我们产生性能非常出色但实际上却是白痴的系统。我们可能会培养出国际象棋、人脸识别、《危险边缘》等节目的天才,但他们在专业领域之外却完全无能为力。7
Unfortunately, this can lead us to systems with very impressive performance that are nonetheless idiot-savants. We might produce prodigies at chess, face-recognition, Jeopardy, and so on, that are completely hopeless outside their area of expertise.7
但还有另一种看待这一切的方式。将人们在 Winograd 模式问题上的行为视为需要解释的自然现象,与光合作用或重力无异。在这种情况下,即使是一个例子也可以告诉我们一些关于人们如何表现的重要信息,无论从统计学上看多么微不足道。
But there is another way of looking at all this. Think of the behaviour of people on Winograd schema questions as a natural phenomenon to be explained, not unlike photosynthesis or gravity. In this case, even a single example can tell us something important about how people are able to behave, however insignificant statistically.
例如,重新考虑上面的泡沫塑料/钢材问题。我们可能会考虑在问题中使用其他特殊单词:对于“轻木”,答案将是“桌子”,对于“花岗岩”,答案将是“大球”,等等。但假设我们在问题中使用了一个未知单词:
Reconsider, for instance, the styrofoam / steel question from above. We might consider using other special words in the question: for “balsa wood,” the answer would be “the table,” for “granite,” it would be “the large ball,” and so on. But suppose we use an unknown word in the question:
大球直接撞破了桌子,因为桌子是由 XYZZY 制成的。什么是由 XYZZY 制成的?
The large ball crashed right through the table because it was made of XYZZY. What was made of XYZZY?
这里没有“正确”的答案:受试者不应该真正偏向一个答案而偏向另一个答案。
Here there is no “correct” answer: subjects should not really favor one answer much over the other.
但假设我们告诉受试者一些关于XYZZY材料的事实:8
But suppose we had told the subjects some facts about the XYZZY material:8
1. 它是陶氏化学公司的注册商标产品。
1. It is a trademarked product of Dow Chemical.
2.通常是白色的,但也有绿色和蓝色的品种。
2. It is usually white, but there are green and blue varieties.
3. 它含有百分之九十八的空气,因此重量轻且有浮力。
3. It is ninety-eight percent air, making it lightweight and buoyant.
4.它是由瑞典发明家卡尔·格奥尔格·蒙特斯首次发现的。
4. It was first discovered by a Swedish inventor, Carl Georg Munters.
我们可以问,在了解了这些事实中的任何一个之后,受试者在什么时候停止猜测?应该清楚的是,这些事实中只有一个真正重要,即第三个事实。但更一般地说,人们之所以能得到关于泡沫塑料的正确答案,正是因为他们已经知道了类似上述第三个事实关于泡沫塑料的构成。这种背景知识至关重要;没有它,行为就会大不相同。
We can ask, on learning any of these facts, at what point do the subjects stop guessing? It should be clear that only one of these facts really matters, the third one. But more generally, people get the right answer for styrofoam precisely because they already know something like the third fact above about the makeup of styrofoam. This background knowledge is critical; without it, the behaviour is quite different.
那么,从这个实验中,我们可以学到什么关于 Winograd 模式问题的答案呢?从纯技术角度来看,这里有一个合理的问题是:
So what do we learn from this experiment about the answering of Winograd schema questions? From a pure technology point of view, a reasonable question to ask here is this:
我们可以在不处理这样的背景知识的情况下产生目标行为的良好表现吗?
Can we produce a good semblance of the target behaviour without having to deal with background knowledge like this?
但从科学的角度来看,我们必须采取不同的立场。我们想了解人类表现出的智能行为需要什么。因此,问题实际上应该更像这样:
But from a science point of view, we must take a different stance. We want to understand what it takes to produce the intelligent behaviour that people exhibit. So the question really needs to be more like this:
什么样的系统能够具备像人类一样行事所必需的背景知识?
What kind of system would have the necessary background knowledge to be able to behave the way people do?
因此,要解释人们实际上能够做什么,我们需要考虑如何建立一个系统,该系统能够充分了解世界,并能像人们一样根据需要运用这些知识。
So to account for what people are actually able to do, we need to consider what it would take to have a system that knows a lot about its world and can apply that knowledge as needed, the way people can.
一种可能性是:
One possibility is this:
这是一个非常激进的想法,由约翰·麦卡锡在一篇非常特别且前所未有的论文(McCarthy,1968)中首次提出。它建议我们抛开任何技巧和捷径的想法,而是专注于需要知道什么、如何用符号表示它以及如何使用这些表示。
This is a very radical idea, first proposed by John McCarthy in a quite extraordinary and unprecedented paper (McCarthy, 1968). It suggests that we should put aside any idea of tricks and shortcuts, and focus instead on what needs to be known, how to represent it symbolically, and how to use the representations.
我并不想暗示,有了麦卡锡的激进思想,一切就都一帆风顺了。一个值得思考的问题是,为什么 55 年后,我们在智能行为科学方面取得的成果如此之少。我认为,答案是它留下了一些未解决的重大问题。
I do not want to suggest that with McCarthy’s radical idea on board, it is all smooth sailing from here. A good question to ask is why, after 55 years, we have so little to show for it regarding the science of intelligent behaviour. The answer, I believe, is that it leaves some major issues unresolved.
我在 IJCAI-85 上的《计算机与思想》讲座(Levesque,1986)部分是对当时非常流行的“知识就是力量”口号的回应。即使在当时,这一切对我来说似乎都太肤浅了。我的感觉是,如果知识不能以合适的符号形式获得,或者不能以易于处理的方式应用,它就不是力量。这指出了麦卡锡主义面临的两个重大障碍:
My Computers and Thought Lecture at IJCAI-85 (Levesque, 1986) was in part a reaction to the “Knowledge is Power” slogan which was quite in vogue at the time. It all seemed too facile to me, even back then. My sense was that knowledge was not power if it could not be acquired in a suitable symbolic form, or if it could not be applied in a tractable way. These point to two significant hurdles faced by the McCarthy approach:
1. 我们对世界和周围人的了解大部分不是来自个人经验,而是来自于我们语言的使用。
1. Much of what we come to know about world and the people around us is not from personal experience, but is due to our use of language.
人们与我们交谈,我们听天气预报和电影中的对话,我们阅读:短信、体育比分、侦探小说等。
People talk to us, we listen to weather reports and to the dialogue in movies, and we read: text messages, sport scores, mystery novels, etc.
然而,看起来我们需要运用广泛的知识才能理解所有这些语言。
And yet, it appears that we need to use extensive knowledge to make good sense of all this language.
2. 即使是最基本的儿童知识似乎也需要广泛的逻辑结构。
2. Even the most basic child-level knowledge seems to call upon a wide range of logical constructs.
因果与非因果、反事实、广义量词、不确定性、其他主体的信念、愿望和意图等。
Cause and effect and non-effect, counterfactuals, generalized quantifiers, uncertainty, other agents’beliefs, desires and intentions, etc.
然而,对这些结构进行符号推理似乎在计算上要求太高了。
And yet, symbolic reasoning over these constructs seems to be much too demanding computationally.
我认为,这两个障碍对人工智能科学来说就像宇宙加速膨胀对天体物理学一样严重和具有挑战性。55 年后,我们很可能会想知道人工智能研究人员是否能够克服它们。
I believe that these two hurdles are as serious and as challenging to the science of AI as an accelerating universe is to astrophysics. After 55 years, we might well wonder if an AI researcher will ever be able to overcome them.
生命短暂(“上市时间”更短),许多人工智能研究人员回归到不那么激进的方法(例如,更多基于生物学,更像统计力学)来关注那些看似知识密集程度较低的行为(例如,识别手写数字、在人群中跟踪面孔、在崎岖地形上行走),这也许并不奇怪。而且结果非常棒!
Life being short (and “time to market” even shorter), it is perhaps not surprising that many AI researchers have returned to less radical methods (e.g., more biologically-based, more like statistical mechanics) to focus on behaviours that are seemingly less knowledge-intensive (e.g., recognizing handwritten digits, following faces in a crowd, walking over rough terrain). And the results have been terrific!
但这些出色的结果不应该让我们否认。我们最好的行为确实包括知识密集型活动,例如参与自然对话或回答 Winograd 模式问题。我希望我们中有足够多的人专注于这种智能行为,以便继续取得进展。
But these terrific results should not put us into denial. Our best behaviour does include knowledge-intensive activities such as participating in natural conversations, or responding to Winograd schema questions. It is my hope that enough of us stay focused on this sort of intelligent behaviour to allow progress to continue here as well.
这需要付出艰苦的努力!我认为,期望解决方案会自发地从一些一般原则中产生,而不需要我们付出任何真正的努力,这是不合理的。例如,我认为我们永远无法构建一个小型计算机程序,给它一个摄像头和一个麦克风,或者把它放到网上,然后期望它自己获得它所需要的一切。
This will require hard work! I think it is unreasonable to expect solutions to emerge spontaneously out of a few general principles, obviating any real effort on our parts. For example, I do not think we will ever be able to build a small computer program, give it a camera and a microphone or put it on the web, and expect it to acquire what it needs all by itself.
所以工作会很辛苦。但在我看来,这更像是攀登高山,而不是铲除车道上的积雪。工作确实很辛苦,但也是一次令人兴奋的冒险!
So the work will be hard. But to my way of thinking, it will be more like scaling a mountain than shoveling a driveway. Hard work, yes, but an exhilarating adventure!
那么那些障碍呢?显然,我没有解决方案。不过,我确实对知识表示领域的同事有一些建议:
What about those hurdles? Obviously, I have no solutions. However, I do have some suggestions for my colleagues in the Knowledge Representation area:
1. 我们需要回归语言知识表示和推理的根源,以及语言本身。
1. We need to return to our roots in Knowledge Representation and Reasoning for language and from language.
我们不应该把英语文本当作一个单一的信息源。相反,我们应该仔细研究如何利用简单的知识库来理解构建稍微复杂一些的知识库所需的简单语言,等等。
We should not treat English text as a monolithic source of information. Instead, we should carefully study how simple knowledge bases might be used to make sense of the simple language needed to build slightly more complex knowledge bases, and so on.
2. 仅仅建设知识库而不更加关注知识库的使用需求是不够的。
2. It is not enough to build knowledge bases without paying closer attention to the demands arising from their use.
我们应该更彻底地探索事实检索和全自动逻辑推理之间的计算空间。我们应该详细研究线性推理模式(比如单元传播)相对于逻辑上要求更高的结构的有效性。
We should explore more thoroughly the space of computations between fact retrieval and full automated logical reasoning. We should study in detail the effectiveness of linear modes of reasoning (like unit propagation, say) over constructs that logically seem to demand more.
对于人工智能社区的其他成员,我有一个最终建议:
As to the rest of the AI community, I do have a final recommendation:
我们应当避免被当今看似最有希望的方法所左右。
We should avoid being overly swayed by what appears to be the most promising approach of the day.
我相信,作为一个领域,我们倾向于遭受所谓的“连续银弹主义”的困扰,其定义如下:
As a field, I believe that we tend to suffer from what might be called serial silver bulletism, defined as follows:
人们倾向于相信人工智能有灵丹妙药,但同时又认为之前关于灵丹妙药的信念极其幼稚。
the tendency to believe in a silver bullet for AI, coupled with the belief that previous beliefs about silver bullets were hopelessly naïve.
我们从多年来人工智能研究的潮流中看到了这一点:首先,自动定理证明将解决所有问题;然后,这些方法似乎太弱了,我们青睐专家系统;然后程序不够定位,我们转向基于行为的机器人技术;然后我们开始相信从大数据中学习就是答案;等等。
We see this in the fads and fashions of AI research over the years: first, automated theorem proving is going to solve it all; then, the methods appear too weak, and we favour expert systems; then the programs are not situated enough, and we move to behaviour-based robotics; then we come to believe that learning from big data is the answer; and on it goes.
我认为,更充分地认识到我们自己的研究没有解决的问题,并愿意承认可能需要其他人工智能方法来处理这个问题,将会带来很多好处。我相信这将有助于减少炒作,让我们与同事保持更好的关系,并使人工智能的进步更加稳步地进行。
I think there is a lot to be gained by recognizing more fully what our own research does not address, and being willing to admit that other AI approaches may be needed for dealing with it. I believe this will help minimize the hype, put us in better standing with our colleagues, and allow progress in AI to proceed in a steadier fashion.
最后,让我用一个关于未来的问题来结束我们的讨论:
Finally, let me conclude with a question about the future:
计算机能否通过图灵测试(图灵首次设想)或甚至广泛的 Winograd 模式测试(不使用廉价技巧)?
Will a computer ever pass the Turing Test (as first envisaged by Turing) or even a broad Winograd Schema Test (without cheap tricks)?
我认为,这个问题的答案就在艾伦·凯的一句话里:“预测未来的最好方法就是创造未来。”我认为这句话的意思是,这个问题其实不应该由专家们来辩论。归根结底,这个问题其实是我们自己,我们要为这项任务付出多少毅力和创造力。而我,就我个人而言,对只要我们下定决心就能取得的成就充满信心。
The answer to this question, I believe, lies in a quote from Alan Kay: “The best way to predict the future is to invent it.” I take this to mean that the question is not really for the pundits to debate. The question, in the end, is really about us, how much perseverance and inventiveness we will bring to the task. And I, for one, have the greatest confidence in what we can do when we set our minds to it.
1.对于那些不知道的人来说,障碍赛是一种赛马比赛,与通常的赛马比赛类似,但马匹必须跳过赛道上的许多树篱。所以它就像马的跨栏赛。
1. For those who do not know, a steeplechase is a horse race, similar to the usual ones, but where the horses must jump over a number of hedges on the racetrack. So it is like hurdles for horses.
2.请参阅 Brian Christian (2011) 的书,其中有关于在洛布纳竞赛中扮演人类的有趣描述。
2. See the book by Brian Christian (2011) for an interesting account of what it was like to play the human in a Loebner contest.
3. ELIZA 程序(Weizenbaum,1966)是解决这个问题的一个好起点。
3. The ELIZA program (Weizenbaum, 1966) is a good place to start on that issue.
4. www.trueknowledge.com 上的程序似乎以这种方式运行
4. The program at www.trueknowledge.com appears to work this way
5.本节主要摘自 Levesque 等人 (2012)。感谢 Ernie Davis 和 Leora Morgenstern 的贡献。
5. This section is drawn mainly from Levesque et al. (2012). I thank Ernie Davis and Leora Morgenstern for their contribution.
6.例如,请参阅http://www.cs.nyu.edu/faculty/davise/papers/WS.html上的合集。
6. See, for example, the collection at http://
7.确实,在 Winograd 模式问题上尝试使用Watson会很有趣:类别是“代词指称”,线索是“琼向苏珊表示感谢,感谢她给予的所有帮助”,期望以问题形式给出的答案是“苏珊是谁?”
7. Indeed, it would be good fun to try Watson on Winograd schema questions: the category is “Pronoun referents,” the clue is “Joan made sure to thank Susan for all the help she had given,” and the desired answer in the form of a question is “Who is Susan?”
8.这些事实摘自维基百科有关泡沫塑料的页面。
8. These facts were lifted from the Wikipedia page for styrofoam.
斯图尔特·J·拉塞尔
Stuart J. Russell
1997
1997
人工智能领域的最终目标往往定义不明确,且存在争议。一些研究人员的目标是模拟人类认知,另一些研究人员的目标是创造智能而不考虑人类特征,还有一些研究人员的目标是创造有用的人工制品而不考虑智能的抽象概念。
AI is a field whose ultimate goal has often been somewhat ill-defined and subject to dispute. Some researchers aim to emulate human cognition, others aim at the creation of intelligence without concern for human characteristics, and still others aim to create useful artifacts without concern for abstract notions of intelligence.
这种多样性并不一定是坏事,因为每种方法都会揭示新的想法,并为其他方法提供养料。但有人可能会说,由于哲学家们痛恨定义真空,许多关于人工智能可行性的破坏性和不实之词的争论都是关于我们作为人工智能研究人员不认同的人工智能定义。
This variety is not necessarily a bad thing, since each approach uncovers new ideas and provides fertilization to the others. But one can argue that, since philosophers abhor a definitional vacuum, many of the damaging and ill-informed debates about the feasibility of AI have been about definitions of AI to which we as AI researchers do not subscribe.
我研究人工智能的动机是将智能视为系统的一般属性,而不是人类的特定属性,并将其理解为智能。我相信这对于整个领域来说是一个合适的目标,当然也包括创造有用的人工制品——既是副产品,也是技术发展的焦点和驱动力。然而,这种“创造智能”观点的困难在于,它假定我们对智能有一些富有成效的概念。认知科学家可以说“看,我的模型正确地预测了人类认知的实验观察”,人工制品开发人员可以说“看,我的系统正在拯救生命/创造巨额资金”,但我们中很少有人会对论文说“看,我的系统是智能的”感到满意。这一困难进一步加剧了理论框架的需要,以便我们能够自信地设计复杂的系统并借鉴他人的成果。“智能”必须有一个可以直接与系统的输入、结构和输出相关的定义。这样的定义也必须是通用的。否则,人工智能就会陷入各种各样的领域——下棋的智能、车辆控制的智能、医疗诊断的智能。
My own motivation for studying AI is to create and understand intelligence as a general property of systems, rather than as a specific attribute of humans. I believe this to be an appropriate goal for the field as a whole, and it certainly includes the creation of useful artifacts—both as a spin-off and as a focus and driving force for technological development. The difficulty with this “creation of intelligence” view, however, is that it presupposes that we have some productive notion of what intelligence is. Cognitive scientists can say “Look, my model correctly predicted this experimental observation of human cognition”, and artifact developers can say “Look, my system is saving lives/megabucks”, but few of us are happy with papers saying “Look, my system is intelligent”. This difficulty is compounded further by the need for theoretical scaffolding to allow us to design complex systems with confidence and to build on the results of others. “Intelligent” must be given a definition that can be related directly to the system’s input, structure, and output. Such a definition must also be general. Otherwise, AI subsides into a smorgasbord of fields—intelligence as chess playing, intelligence as vehicle control, intelligence as medical diagnosis.
在本文中,我将概述此类定义在人工智能和相关学科的发展历程中的发展。我将把每个定义视为一个谓词P,据说可以用来描述智能系统。对于每个P ,我将讨论“看,我的系统是P ”这一陈述是否有趣,至少有时是正确的,以及P系统的研究将导致什么样的研究和技术发展。
In this paper, I shall outline the development of such definitions over the history of AI and related disciplines. I shall examine each definition as a predicate P that can be applied, supposedly, to characterize systems that are intelligent. For each P, I shall discuss whether the statement “Look, my system is P” is interesting and at least sometimes true, and the sort of research and technological development to which the study of P-systems leads.
首先,我认为智能与成功行为的能力密切相关,即所谓的“基于代理”的人工智能观点。智能的正式定义如下:
I shall begin with the idea that intelligence is strongly related to the capacity for successful behaviour—the so-called “agent-based” view of AI. The candidates for formal definitions of intelligence are as follows:
我将详细阐述这四个定义,并描述迄今为止在这方面取得的一些成果。然后,我将描述计算理性和有界最优性方面的正在进行和未来的工作。
All four definitions will be fleshed out in detail, and I will describe some results that have been obtained so far along these lines. Then I will describe ongoing and future work under the headings of calculative rationality and bounded optimality.
我认为,在这些候选方案中,有界最优性最接近满足人工智能研究的需求。这种说法始终存在危险,即接受它会导致“过早数学化”,这种情况的特点是,结果越来越技术化,而与原始问题(在人工智能的情况下,即创造智能的问题)的关系却越来越小。有界最优性研究是否适合替代智能研究?我希望表明P 4 (有界最优性)比P 1到P 3更合适,因为它是一个具有真实和理想解决方案的真实问题,也因为它满足了关于智能本质的一些基本直觉。一些关于智能的重要问题只能在有界最优性或其相关框架内制定和回答。然而,只有时间才能告诉我们,有界最优性研究(也许经过进一步改进)是否能够产生足够的理论框架来支持人工智能的重大实际进展。
I shall be arguing that, of these candidates, bounded optimality comes closest to meeting the needs of AI research. There is always a danger, in this sort of claim, that its acceptance can lead to “premature mathematization”, a condition characterized by increasingly technical results that have increasingly little to do with the original problem—in the case of AI, the problem of creating intelligence. Is research on bounded optimality a suitable stand-in for research on intelligence? I hope to show that P4, bounded optimality, is more suitable than P1 through P3 because it is a real problem with real and desirable solutions, and also because it satisfies some essential intuitions about the nature of intelligence. Some important questions about intelligence can only be formulated and answered within the framework of bounded optimality or some relative thereof. Only time will tell, however, whether bounded optimality research, perhaps with additional refinements, can generate enough theoretical scaffolding to support significant practical progress in AI.
直到最近,人们普遍将人工智能定义为“心智能力”或“智能系统”的计算研究,对各种类型进行分类,然后就此打住。这并没有提供太多指导。相反,人们可以将人工智能定义为设计做正确事情的系统的问题。现在我们只需要对“正确”进行定义。
Until fairly recently, it was common to define AI as the computational study of “mental faculties” or “intelligent systems”, catalogue various kinds, and leave it at that. This does not provide much guidance. Instead, one can define AI as the problem of designing systems that do the right thing. Now we just need a definition for “right”.
这种方法涉及将智能实体视为代理,即感知其环境并对其采取行动的系统。正式地说,代理是通过从感知序列到代理实例化的操作的映射来定义的。让O成为集合代理可以随时观察到的感知,A是代理可以在外部世界执行的可能动作集(包括什么也不做)。因此,代理函数 f : O * → A定义了代理在任何情况下的行为方式。首先重要的是代理做了什么,而不一定是它想了什么,甚至它是否思考。这种最初拒绝考虑对代理内部运作的进一步限制(例如,它应该进行逻辑推理)有三点好处:首先,它允许我们将诸如规划和推理之类的“认知能力”视为为找到正确的事情而发生的;其次,它包含而不是排除系统可以在没有这种认知能力的情况下做正确的事情的立场(Agre 和 Chapman,1987;Brooks,1989);第三,它允许更自由地考虑子系统的各种规范、边界和互连。
This approach involves considering the intelligent entity as an agent, that is to say a system that senses its environment and acts upon it. Formally speaking, an agent is defined by the mapping from percept sequences to actions that the agent instantiates. Let O be the set of percepts that the agent can observe at any instant, and A be the set of possible actions the agent can carry out in the external world (including the action of doing nothing). Thus the agent function f: O* → A defines how an agent behaves under all circumstances. What counts in the first instance is what the agent does, not necessarily what it thinks, or even whether it thinks at all. This initial refusal to consider further constraints on the internal workings of the agent (such as that it should reason logically, for example) helps in three ways: first, it allows us to view such “cognitive faculties” as planning and reasoning as occurring in the service of finding the right thing to do; second, it encompasses rather than excludes the position that systems can do the right thing without such cognitive faculties (Agre and Chapman, 1987; Brooks, 1989); third, it allows more freedom to consider various specifications, boundaries, and interconnections of subsystems.
基于代理的人工智能观点已迅速从“情境性”和“嵌入性”研讨会转移到主流教科书(Dean 等人,1995 年;Russell 和 Norvig,1995 年)和《新闻周刊》的流行语中。宽泛地说,理性代理是指其行为从代理所拥有的信息及其目标(或设计其执行的任务)的角度来看有意义的代理。理性是行为的一种属性,它并不指定(尽管它确实限制了)选择行为的过程。这是 Simon(1956 年)强调的一点,他创造了实质性理性和程序理性这两个术语来描述做出什么决策的问题和如何做出决策的问题之间的区别。 Rod Brooks 1991 年的计算机与思维讲座题为“没有理性的智能”(另见 Brooks,1991),强调了这样一个事实:推理(也许)是代理的派生属性,可能是也可能不是实现理性行为的良好实施方案。证明许多人工智能研究人员认为理所当然的认知结构的合理性并非易事。
The agent-based view of AI has moved quickly from workshops on “situatedness” and “embeddedness” to mainstream textbooks (Dean et al., 1995; Russell and Norvig, 1995) and buzzwords in Newsweek. Rational agents, loosely speaking, are agents whose actions make sense from the point of view of the information possessed by the agent and its goals (or the task for which it was designed). Rationality is a property of actions and does not specify—although it does constrain—the process by which the actions are selected. This was a point emphasized by Simon (1956), who coined the terms substantive rationality and procedural rationality to describe the difference between the question of what decision to make and the question of how to make it. That Rod Brooks’1991 Computers and Thought lecture was titled “Intelligence without Reason” (see also Brooks, 1991) emphasizes the fact that reasoning is (perhaps) a derived property of agents that might, or might not, be a good implementation scheme to achieve rational behaviour. Justifying the cognitive structures that many AI researchers take for granted is not an easy problem.
基于代理的智能观点的另一个结果是,它使人工智能面临来自其他领域的竞争,这些领域传统上将嵌入式代理视为一个自然的研究主题。控制理论是其中最重要的,但进化编程和进化生物学本身也有想法可以贡献。1
One other consequence of the agent-based view of intelligence is that it opens AI up to competition from other fields that have traditionally looked on the embedded agent as a natural topic of study. Control theory is foremost among these, but evolutionary programming and indeed evolutionary biology itself also have ideas to contribute.1
代理视图的流行也帮助该领域朝着解决实际问题的方向发展,避免了布鲁克斯所说的“幻觉”问题,即当子系统的脆弱性被智能人类提供输入并解释其输出所掩盖时出现的“幻觉”问题。
The prevalence of the agent view has also helped the field move towards solving real problems, avoiding what Brooks calls the “hallucination” problem that arises when the fragility of a subsystem is masked by having an intelligent human providing input to it and interpreting its outputs.
完全理性会限制代理的行为,以在现有信息下提供最大的成功期望。我们可以将这一概念扩展如下(见图8.1)。定义的基本输入是代理将在其中运行的环境类E和评估以下序列的绩效指标U代理通过这些状态来驱动实际环境。令V ( f, E , U ) 表示代理函数f在环境类E中获得的根据U 的预期值,其中(现在)我们将假设E元素的概率分布。然后,完全理性的代理由代理函数f opt定义,即
Perfect rationality constrains an agent’s actions to provide the maximum expectation of success given the information available. We can expand this notion as follows (see figure 8.1). The fundamental inputs to the definition are the environment class E in which the agent is to operate and the performance measure U which evaluates the sequence of states through which the agent drives the actual environment. Let V(f, E, U) denote the expected value according to U obtained by an agent function f in environment class E, where (for now) we will assume a probability distribution over elements of E. Then a perfectly rational agent is defined by an agent function fopt such that
图 8.1
代理从环境中接收感知并产生行为,进而导致环境产生状态历史。性能指标评估状态历史,以得出代理的价值。
Figure 8.1
The agent receives percepts from the environment and generates a behaviour which in turn causes the environment to generate a state history. The performance measure evaluates the state history to arrive at the value of the agent.
这只是一种奇特的说法,即最佳代理会尽其所能。关键在于,完全理性的行为是E和U的一个明确定义的函数,我将其称为任务环境。下面将讨论计算此函数的问题。
This is just a fancy way of saying that the best agent does the best it can. The point is that perfectly rational behaviour is a well-defined function of E and U, which I will call the task environment. The problem of computing this function is addressed below.
完美理性在人工智能中的理论作用在 Newell 的论文《知识层面》(Newell,1982 年)中得到了很好的描述。人工智能系统的知识层面分析依赖于完美理性的假设。它可以用来确定任何可能系统的性能上限,即确定一个完全理性的代理在拥有相同知识的情况下会做什么。
The theoretical role of perfect rationality within AI is well-described by Newell’s paper on the Knowledge Level (Newell, 1982). Knowledge-level analysis of AI systems relies on an assumption of perfect rationality. It can be used to establish an upper bound on the performance of any possible system, by establishing what a perfectly rational agent would do given the same knowledge.
尽管完全理性的主体所拥有的知识决定了它在实现目标后将采取的行动,但知识从何而来的问题尚未得到很好的理解。也就是说,我们需要理解理性学习和理性行动。从理性的逻辑观点来看,学习几乎没有受到关注——事实上,纽厄尔的分析排除了知识层面的学习。从决策理论的观点来看,贝叶斯更新为理性学习提供了一个模型,但这将问题推回到了先验(Carnap,1950)。理性先验的问题,特别是对于表达性表示语言的问题,仍未得到解决。
Although the knowledge that a perfectly rational agent has determines the actions that it will take given its goals, the question of where the knowledge comes from is not well understood. That is, we need to understand rational learning as well as rational action. In the logical view of rationality, learning has received almost no attention—indeed, Newell’s analysis precludes learning at the knowledge level. In the decision-theoretic view, Bayesian updating provides a model for rational learning, but this pushes the question back to the prior (Carnap, 1950). The question of rational priors, particularly for expressive representation languages, remains unsettled.
完美理性的另一个方面是缺乏一套适当的技术来指定效用函数。在经济学中,许多结果都是通过把整体效用分解成可以以各种方式组合的属性而得出的(Keeney 等人,1976 年),但这种方法在人工智能领域几乎没有取得任何进展(但参见 Bacchus 和 Grove,1995 年;Wellman,1985 年)。我们也不太清楚如何指定随时间变化的效用,尽管这个问题经常被提出,但我们对目标和效用之间的关系并没有令人满意的理解。
Another aspect of perfect rationality that is lacking is the development of a suitable body of techniques for the specification of utility functions. In economics, many results have been derived on the decomposition of overall utility into attributes that can be combined in various ways (Keeney et al., 1976), yet such methods have made few inroads into AI (but see Bacchus and Grove, 1995; Wellman, 1985). We also have little idea how to specify utility over time, and although the question has been raised often, we do not have a satisfactory understanding of the relationship between goals and utility.
完全理性代理的优点在于,如果你手头有一个代理,你会更喜欢它而不是其他代理。此外,如果你是一名经济学家,你可以证明由它们组成的经济体的良好结果;如果你想设计分布式智能系统,假设每个代理都具有完全理性,交互机制的设计就会容易得多。缺点是,完全理性理论不提供代理内部设计的分析:一个完全理性的代理与另一个完全理性的代理一样好。正如西蒙所指出的,真正糟糕的是,完全理性的代理并不存在。物理机制需要时间来处理信息和选择动作,因此真实代理的行为无法立即反映环境的变化,并且通常不是最优的。
The good thing about perfectly rational agents is that if you have one handy, you prefer it to any other agent. Furthermore, if you are an economist you can prove nice results about economies populated by them; and if you want to design distributed intelligent systems, assuming perfect rationality on the part of each agent makes the design of the interaction mechanisms much easier. The bad thing is that the theory of perfect rationality does not provide for the analysis of the internal design of the agent: one perfectly rational agent is as good as another. The really bad thing, as pointed out by Simon, is that perfectly rational agents do not exist. Physical mechanisms take time to process information and select actions, hence the behaviour of real agents cannot immediately reflect changes in the environment and will generally be suboptimal.
在讨论计算理性之前,有必要介绍一下代理函数和代理程序之间的区别。在人工智能中,代理被实现为一个程序,我将其称为l,它在一台机器上运行,我将其称为M。代理程序接收当前感知作为输入,但也有内部状态,以某种形式反映先前的感知。它在选择动作后输出动作。从外部来看,代理的行为由选定的动作和不动作(或机器生成的任何默认动作)组成。
Before discussing calculative rationality, it is necessary to introduce a distinction between the agent function and the agent program, In AI, an agent is implemented as a program, which I shall call l, running on a machine, which I shall call M. An agent program receives as input the current percept, but also has internal state that reflects, in some form, the previous percepts. It outputs actions when they have been selected. From the outside, the behaviour of the agent consists of the selected actions interspersed with inaction (or whatever default actions the machine generates).
计算理性表现在程序中,如果执行速度无限快,将产生完全理性的行为。与完全理性不同,计算理性是许多实际程序可以满足的要求。与完全理性不同的是,计算理性不一定是一种理想的特性。例如,一个计算理性的国际象棋程序会选择“正确”的举动,但这样做可能需要 10 50倍的时间。
Calculative rationality is displayed by programs that, if executed infinitely fast, would result in perfectly rational behaviour. Unlike perfect rationality, calculative rationality is a requirement that can be fulfilled by many real programs. Also unlike perfect rationality, calculative rationality is not necessarily a desirable property. For example, a calculatively rational chess program will choose the “right” move, but may take 1050 times too long to do so.
尽管如此,追求计算理性一直是人工智能理论研究的主要活动。在该领域的早期阶段,重要的是要专注于“认识论充分性”而不是“启发式充分性”——即原则上的能力而不是实践上的能力。2计算理性一直是逻辑和决策理论传统的支柱。在逻辑传统中,绩效衡量标准接受在所有情况下实现指定目标的行为,并拒绝任何其他行为。因此,纽厄尔(1982)将理性行为定义为那些保证实现代理的目标之一。逻辑规划系统(例如使用情境演算的定理证明器)满足此定义下的计算理性条件。在决策理论传统中,计算理性代理的设计主要在人工智能之外进行 - 例如,在随机最优控制理论(Kumar and Varaiya,1986)中。表示通常非常贫乏(基于状态而不是句子),可解问题要么非常小,要么非常专业。在人工智能中,概率网络或信念网络的发展为代理设计开辟了许多新的可能性,在许多情况下使表示复杂性呈指数级降低。基于影响图的系统(添加了动作和值节点的概率网络)满足决策理论版本的计算理性。
The pursuit of calculative rationality has nonetheless been the main activity of theoretically well-founded research in AI. In the early stages of the field, it was important to concentrate on “epistemological adequacy” before “heuristic adequacy”—that is, capability in principle rather than in practice.2 Calculative rationality has been the mainstay of both the logical and the decision-theoretic traditions. In the logical tradition, the performance measure accepts behaviours that achieve the specified goal in all cases and rejects any others. Thus Newell (1982) defines rational actions as those that are guaranteed to achieve one of the agent’s goals. Logical planning systems, such as theorem-provers using situation calculus, satisfy the conditions of calculative rationality under this definition. In the decision-theoretic tradition, the design of calculatively rational agents has largely gone on outside AI—for example, in stochastic optimal control theory (Kumar and Varaiya, 1986). Representations have usually been very impoverished (state-based rather than sentential) and solvable problems have been either very small or very specialized. Within AI, the development of probabilistic networks or belief networks has opened up many new possibilities for agent design, providing in many cases an exponential reduction in representational complexity. Systems based on influence diagrams (probabilistic networks with action and value nodes added) satisfy the decision-theoretic version of calculative rationality.
实际上,无论是逻辑还是决策理论传统都无法避免计算理性要求所带来的决策问题的棘手性。一种应对措施是排除所处理的表示和推理任务中指数复杂性的来源,以便计算理性和完美理性相一致——至少,如果我们忽略多项式时间计算的小问题。这一立场在 1985 年 Hector Levesque(Levesque,1986 年;Levesque 和 Brachman,1987 年)和 1989 年 Henry Kautz 的两次精彩的计算机和思想讲座中得到了阐述。关于可处理子语言的随附研究结果可能最好被视为复杂性可能成为问题的迹象,而不是复杂性问题的解决方案。Doyle 和 Patil(1991 年)强烈反对限制表达能力的想法,他们指出,这也限制了在这种约束下设计的表示和推理服务的适用性。3
In practice, neither the logical nor the decision-theoretic traditions can avoid the intractability of the decision problems posed by the requirement of calculative rationality. One response is to rule out sources of exponential complexity in the representations and reasoning tasks addressed, so that calculative and perfect rationality coincide—at least, if we ignore the little matter of polynomial-time computation. This position was expounded in two fascinating Computers and Thought lectures given by Hector Levesque in 1985 (Levesque, 1986; Levesque and Brachman, 1987) and by Henry Kautz in 1989. The accompanying research results on tractable sublanguages are perhaps best seen as indications of where complexity may be an issue rather than as a solution to the problem of complexity. The idea of restricting expressiveness was strongly opposed by Doyle and Patil (1991), who pointed out that it also restricts the applicability of the representation and inference services designed under such constraints.3
在分布式人工智能领域,系统设计者可以控制每个代理环境中涉及与其他代理协商的部分。因此,控制复杂性的一种可能方法是限制协商问题,以便轻松做出最佳决策。例如,可以使用克拉克税机制来确保每个代理的最佳策略只是如实陈述其偏好(Ephrati 和 Rosenschein,1991 年)。当然,这种方法不一定会导致代理集合的最佳行为;也不能解决与环境其他部分交互的复杂性问题。
In the area of distributed AI, the system designer has control over that part of each agent’s environment that involves negotiations with other agents. Thus, one possible way to control complexity is to constrain the negotiation problem so that optimal decisions can be made easily. For example, the Clarke Tax mechanism can be used to ensure that the best policy for each agent is simply to state its preferences truthfully (Ephrati and Rosenschein, 1991). Of course, this approach does not necessarily result in optimal behaviour by the ensemble of agents; nor does it solve the problem of complexity in interacting with the rest of the environment.
应对复杂性最常见的方法是使用各种加速技术和近似值,以期获得合理的行为。人工智能已经开发出一套非常强大的方法来降低复杂性,包括将状态表示分解为句子形式;环境模型的稀疏表示(如 STRIPS 运算符);解决方案分解方法,如偏序规划和抽象;用于强化学习的价值函数的近似、参数化表示;编译(chucking、宏运算符、EBL 等);以及元级控制的应用。虽然其中一些方法可以保留最优性的保证,并且对于结构良好的中等规模问题有效,但智能代理不可避免地无法在所有情况下都采取理性行动。自人工智能诞生之初,这种观察就很常见。然而,选择次优行动的系统本身并不属于计算理性,我们需要更好的理论来理解它们。
The most common response to complexity has been to use various speedup techniques and approximations in the hope of getting reasonable behaviour. AI has developed a very powerful armoury of methods for reducing complexity, including the decomposition of state representations into sentential form; sparse representations of environment models (as in STRIPS operators); solution decomposition methods such as partial-order planning and abstraction; approximate, parameterized representations of value functions for reinforcement learning; compilation (chucking, macro-operators, EBL, etc.); and the application of metalevel control. Although some of these methods can retain guarantees of optimality and are effective for moderately large problems that are well structured, it is inevitable that intelligent agents will be unable to act rationally in all circumstances. This observation has been a commonplace since the very beginning of AI. Yet systems that select suboptimal actions fall outside calculative rationality per se, and we need a better theory to understand them.
元级理性,也被 IJ Good (1971) 称为第二类理性,基于在计算成本和决策质量之间寻找最佳平衡的理念。尽管 Good 从未精确阐述过第二类理性的概念——他将其定义为“考虑审议成本的情况下预期效用的最大化”——但很明显,其目的是利用某种元级架构来实现这种平衡。元级架构是一种智能代理的设计理念,它将代理划分为两个(或更多)概念部分。对象级执行与应用领域有关的计算——例如,预测物理动作的结果、计算某些状态的效用等等。元级是第二个决策过程,其应用领域包括对象级计算本身以及它们影响的计算对象和状态。元推理在人工智能领域有着悠久的历史,至少可以追溯到 20 世纪 70 年代初(有关历史详细信息,请参阅 Russell 和 Wefald,1991a)。人们还可以将选择性搜索方法和修剪策略视为体现有关追求特定对象级搜索操作的可取性的元级专业知识。
Metalevel rationality, also called Type II rationality by I.J Good (1971), is based on the idea of finding an optimal tradeoff between computational costs and decision quality. Although Good never made his concept of Type II rationality very precise—he defines it as “the maximization of expected utility taking into account deliberation costs—it is clear that the aim was to take advantage of some sort of metalevel architecture to implement this tradeoff. Metalevel architecture is a design philosophy for intelligent agents that divides the agent into two (or more) notional parts. The object level carries out computations concerned with the application domain—for example, projecting the results of physical actions, computing the utility of certain states, and so on. The metalevel is a second decision-making process whose application domain consists of the object-level computations themselves and the computational objects and states that they affect. Metareasoning has a long history in AI, going back at least to the early 1970s (see Russell and Wefald, 1991a for historical details). One can also view selective search methods and pruning strategies as embodying metalevel expertise concerning the desirability of pursuing particular object-level search operations.
理性元推理理论将元层次能够“正确思考”的直觉形式化。其基本思想是,对象层次的计算是有成本(时间流逝)和收益(决策质量的提高)的行为。理性元层次根据预期效用选择计算。理性元推理的前身是信息价值理论(Howard,1966 年)——该概念认为,通过模拟信息请求的每个可能结果下将遵循的决策过程,可以计算出获取额外信息的决策理论价值,从而估计这些结果平均的决策质量预期改进。与信息收集类似,计算过程的应用似乎起源于 Matheson(1968 年)。在人工智能领域,Horvitz (1987a,b)、Breese 和 Fehling (1990) 以及 Russell 和 Wefald (1989, 1991a,b) 都展示了计算价值的思想如何解决实时决策的基本问题。
The theory of rational metareasoning formalizes the intuition that the metalevel can “do the right thinking.” The basic idea is that object-level computations are actions with costs (the passage of time) and benefits (improvements in decision quality). A rational metalevel selects computations according to their expected utility. Rational metareasoning has as a precursor the theory of information value (Howard, 1966)—the notion that one can calculate the decision-theoretic value of acquiring an additional piece of information by simulating the decision process that would be followed given each possible outcome of the information request, thereby estimating the expected improvement in decision quality averaged over those outcomes. The application to computational processes, by analogy to information-gathering, seems to have originated with Matheson (1968). In AI, Horvitz (1987a,b), Breese and Fehling (1990), and Russell and Wefald (1989, 1991a,b) all showed how the idea of value of computation could solve the basic problems of real-time decision-making.
Eric Wefald 所做的工作特别关注搜索算法,其中对象级计算将各种行动结果的预测扩展到未来。例如,在国际象棋程序中,每个对象级计算都会扩展游戏树的一个叶节点。然后,元级问题是选择要扩展的节点并在适当的点终止搜索。此类系统中元推理的主要问题是,计算的局部效应不会直接转化为改进的决策,因为还有一个复杂的过程,即将叶节点的局部效应传播回根节点和移动选择。事实证明,可以根据“局部努力”和“传播函数”找到计算值的通用公式,这样就可以为任何特定的对象级系统(例如极小最大传播)实例化该公式,并在运行时高效编译和执行。该方法已用于双人游戏、具有机会节点的双人游戏和单代理搜索。在每种情况下,相同的通用元推理方案都会比传统的、高度工程化的算法产生大约一个数量级的效率提升。
The work done with Eric Wefald looked in particular at search algorithms, in which the object-level computations extend projections of the results of various courses of actions further info the future. For example, in chess programs, each object-level computation expands a leaf node of the game tree. The metalevel problem is then to select nodes for expansion and to terminate search at the appropriate point. The principal problem with metareasoning in such systems is that the local effects of the computations do not directly translate into improved decisions, because there is also a complex process of propagating the local effects at the leaf back to the root and the move choice. It turns out that a general formula for the value of computation can be found in terms of the “local efforts” and the “propagation function”, such that the formula can be instantiated for any particular object-level system (such as minimax propagation), compiled, and executed efficiently at runtime. This method was implemented for two-player games, two-player games with chance nodes, and single-agent search. In each case, the same general metareasoning scheme resulted in efficiency improvements of roughly an order of magnitude over traditional, highly-engineered algorithms.
另一类常见的元推理问题出现在随时(Dean and Boddy,1988)或灵活(Horvitz,1987a)算法中,这些算法旨在返回结果,其质量随分配给计算的时间量而变化。最简单的元推理类型是将单个算法的决策质量预期提升(以性能配置文件衡量)与时间成本(Simon,1955)进行权衡。如果性能配置文件的二阶导数为负,则贪婪终止条件是最佳的。如果希望从随时组件构建复杂的实时系统,则会出现更复杂的问题。首先,必须确保组成系统的可中断性- 即确保整个系统能够稳健地响应即时的输出需求。解决方案是交错执行所有组件,为每个组件分配时间,以便系统每个完整迭代改进周期的总时间在每次迭代时加倍。通过这种方式,我们可以构建一个复杂系统,该系统可以处理任意和意外的实时需求,就好像它提前知道可用的确切时间一样,速度只会受到很小的(≤4)常数因子损失(Russell and Zilberstein,1991)。其次,必须在组件之间优化分配可用的计算,以最大化总输出质量。虽然这在一般情况下是 NP 难的,但当组件的调用图是树形结构时,它可以在与程序大小成线性关系的时间内求解(Zilberstein and Russell,1996)。虽然这些结果是在具有明确定义性能配置文件的任何时间算法的相对干净的背景下得出的,但我们有理由期待,复杂系统中鲁棒实时决策的一般问题可以在实践中得到处理。
Another general class of metareasoning problems arises with anytime (Dean and Boddy, 1988) or flexible (Horvitz, 1987a) algorithms, which are algorithms designed to return results whose quality varies with the amount of time allocated to computation. The simplest type of metareasoning trades off the expected increase in decision quality for a single algorithm, as measured by a performance profile, against the cost of time (Simon, 1955). A greedy termination condition is optimal if the second derivative of the performance profile is negative. More complex problems arise if one wishes to build complex real-time systems from anytime components. First, one has to ensure the interruptibility of the composed system—that is, to ensure that the system as a whole can respond robustly to immediate demands for output. The solution is to interleave the execution of all the components, allocating time to each component so that the total time for each complete iterative improvement cycle of the system doubles at each iteration. In this way, we can construct a complex system that can handle arbitrary and unexpected real-time demands exactly as if it knew the exact time available in advance, with just a small ( ≤ 4) constant factor penalty in speed (Russell and Zilberstein, 1991). Second, one has to allocate the available computation optimally among the components to maximize the total output quality. Although this is NP-hard for the general case, it can be solved in time linear in program size when the call graph of the components is tree-structured (Zilberstein and Russell, 1996). Although these results are derived in the relatively clean context of anytime algorithms with well-defined performance profiles, there is reason to expect that the general problem of robust real-time decision-making in complex systems can be handled in practice.
过去几年,关于元知识和元推理的性质出现了一场有趣的争论。TEIRESIAS(Davis,1980)提出了这样一种观点,即明确的、特定领域的元知识是专家系统创建的一个重要方面。因此,元知识是一种“额外”领域知识,超出了对象级领域知识,人们必须将其添加到 AI 系统中才能使其正常工作。另一方面,在上述理性元推理的研究中,很明显,描述计算效果的元理论与领域无关(Ginsberg 和 Geddis,1991;Russell 和 Wefald,1991a)。原则上,不需要额外的领域知识来评估计算的好处。在实践中,从第一原理进行元推理的成本可能非常高。为了避免这种情况,可以将特定领域的元级分析结果汇编成领域特定的元知识,或者直接从经验中学习此类知识(见 Russell 和 Wefald,1991a,第 6 章,以及 Minton,1996 年)。这种对新兴“计算专业知识”的看法带来了对智能的根本洞察——即,从一种有趣的意义上讲,算法并不是人工智能系统的必要组成部分。相反,我们可以想象一个理性引导的计算的一般过程,它与环境属性相互作用,产生越来越高效的决策。在我看来,这种思维方式巧妙地解决了人工智能的一个主要难题:如果人工智能需要的是极其狡猾和超高效的算法,远远超出了当前计算机科学家的最佳努力,那么进化(以及机器学习又将如何)实现这一目标?
Over the last few years, an interesting debate has emerged concerning the nature of metaknowledge and metareasoning. TEIRESIAS (Davis, 1980) established the idea that explicit, domain-specific metaknowledge was an important aspect of expert system creation. Thus, metaknowledge is a sort of “extra” domain knowledge, over and above the object-level domain knowledge, that one has to add to an AI system to get it to work well. On the other hand, in the work on rational metareasoning described above, it is clear that the metatheory describing the effects of computations is domain-independent (Ginsberg and Geddis, 1991; Russell and Wefald, 1991a). In principle, no additional domain knowledge is needed to assess the benefits of a computation. In practice, metareasoning from first principles can be very expensive. To avoid this, the results of metalevel analysis for particular domains can be compiled into domain-specific metaknowledge, or such knowledge can be learned directly from experience (see Russell and Wefald, 1991a, Chapter 6, and Minton, 1996). This view of emerging “computational expertise” leads to a fundamental insight into intelligence—namely, that there is an interesting sense in which algorithms are not a necessary part of AI systems. Instead, one can imagine a general process of rationally guided computation interacting with properties of the environment to produce more and more efficient decision making. To my mind, this way of thinking finesses one major puzzle of AI: if what is required for AI is incredibly devious and superbly efficient algorithms far surpassing the current best efforts of computer scientists, how did evolution (and how will machine learning) ever get there?
在理性元推理领域,仍然存在一些重大的未解决的问题。一个明显的困难是,迄今为止,几乎所有系统都采用了一种短视的策略——在元级进行贪婪的深度搜索。显然,计算序列的最佳选择问题至少与底层对象级问题一样棘手。尽管如此,必须考虑序列,因为在某些情况下,计算的价值可能在进一步计算之前不会明显改善决策质量。这表明强化学习技术可能是有效的,特别是因为计算的“奖励函数”(即决策质量的改善)很容易在事后用于元级。其他可能的研究领域包括为更复杂的系统创建有效的元级控制器,例如抽象层次规划器、混合架构等。
Significant open problems remain in the area of rational metareasoning. One obvious difficulty is that almost all systems to date have adopted a myopic strategy—a greedy, depth-one search at the metalevel. Obviously, the problem of optimal selection of computation sequences is at least as intractable as the underlying object-level problem. Nonetheless, sequences must be considered because in some cases the value of a computation may not be apparent as an improvement in decision quality until further computations have been done. This suggests that techniques from reinforcement learning could be effective, especially as the “reward function” for computation—that is, the improvement in decision quality—is easily available to the metalevel post hoc. Other possible areas for research include the creation of effective metalevel controllers for more complex systems such as abstraction hierarchy planners, hybrid architectures, and so on.
尽管理性的元推理似乎是应对复杂性的有用工具,但元层次理性作为资源受限代理的正式框架的概念似乎站不住脚。原因是,由于元推理成本高昂,因此无法以最佳方式进行。对象层次理性的历史在元层次上重演:元层次上的完美理性是无法实现的,元层次上的计算理性是无用的。因此,必须在元层次计算中进行时间/最优性权衡,例如上述的短视近似。然而,在元层次理性的框架内,没有办法确定时间和元层次决策质量之间的适当权衡。任何通过元层次这样做的尝试只会导致概念倒退。此外,在某些环境中,最有效的代理设计完全有可能根本不进行元推理,而只是对情况做出反应。这些考虑表明,正确的方法是走出代理;避免对代理做出的个别决策进行微观管理。这是有界优化所采用的方法。
Although rational metareasoning seems to be a useful tool in coping with complexity, the concept of metalevel rationality as a formal framework for resource-bounded agents does not seem to hold water. The reason is that, since metareasoning is expensive, it cannot be carried out optimally. The history of object-level rationality has repeated itself at the metalevel: perfect rationality at the metalevel is unattainable and calculative rationality at the metalevel is useless. Therefore, a time/optimality tradeoff has to be made for metalevel computations, as for example with the myopic approximation mentioned above. Within the framework of metalevel rationality, however, there is no way to identify the appropriate tradeoff of time for metalevel decision quality. Any attempt to do so via a metametalevel simply results in a conceptual regress. Furthermore, it is entirely possible that in some environments, the most effective agent design will do no metareasoning at all, but will simply respond to circumstances. These considerations suggest that the right approach is to step outside the agent, as it were; to refrain from micromanaging the individual decisions made by the agent. This is the approach taken in bounded optimality.
完全理性和元层次理性的困难源于对事物(行为、计算)施加的约束,而这些约束不是代理设计者直接控制的。如果没有真正的代理能够满足规范,那么指定行为或计算是理性的也是毫无意义的。设计者控制着程序。在 Russell 和 Subramanian (1995) 的著作中,引入了给定机器的可行性概念来描述可由该机器上运行的某些代理程序实现的所有代理函数集。这有点类似于可计算性的概念,但要严格得多,因为它将程序在具有有限速度的形式机器模型上的操作与代理生成的实际时间行为联系起来。
The difficulties with perfect rationality and metalevel rationality arise from the imposition of constraints on things (actions, computations) that the agent designer does not directly control. Specifying that actions or computations be rational is of no use if no real agents can fulfill the specification. The designer controls the program. In Russell and Subramanian (1995), the notion of feasibility for a given machine is introduced to describe the set of all agent functions that can be implemented by some agent program running on that machine. This is somewhat analogous to the idea of computability, but is much stricter because it relates the operation of a program on a formal machine model with finite speed to the actual temporal behaviour generated by the agent.
鉴于这一观点,人们立即想到最优可行行为是一个有趣的概念,并想到找到生成它的程序。假设我们将Agent ( l, M ) 定义为由在机器M上运行的程序l实现的代理函数。那么有界最优程序l opt定义为
Given this view, one is led immediately to the idea that optimal feasible behaviour is an interesting notion, and to the idea of finding the program that generates it. Suppose we define Agent(l, M) to be the agent function implemented by the program l running on machine M. Then the bounded optimal program lopt is defined by
其中 ℒ M是可以在M上运行的所有程序的有限集。这就是P 4。有界最优性。
where ℒM is the finite set of all programs that can be run on M. This is P4. bounded optimality.
在人工智能领域,有限优化的理念在 20 世纪 80 年代末期的几个对资源有限理性这一一般主题感兴趣的讨论小组中流传,尤其是罗克韦尔(由迈克尔·费林组织)和斯坦福(由迈克尔·布拉特曼组织)的小组。“有限优化”这一术语似乎由埃里克·霍维茨 (1987b) 首创,他将其非正式地定义为“在给定一组关于预期问题和资源约束的假设的情况下,计算效用的优化”。
In AI, the idea of bounded optimality floated around among several discussion groups interested in the general topic of resource-bounded rationality in the late 1980s, particularly those at Rockwell (organized by Michael Fehling) and Stanford (organized by Michael Bratman). The term “bounded optimality” seems to have been originated by Eric Horvitz (1987b), who defined it informally as “the optimization of computational utility given a set of assumptions about expected problems and constraints on resources”.
博弈论中也出现了类似的想法,即从考虑游戏中的最佳决策转向考虑最佳决策程序。这会导致不同的结果,因为它限制了每个代理对另一个代理进行无限模拟的能力,而后者也在对第一个代理进行无限模拟,依此类推。即使是可计算性的要求也会产生很大的不同(Megiddo 和 Wigderson,1986 年)。玩家复杂性的界限也已成为人们密切关注的话题。Papadimitriou 和 Yannakakis(1994 年)已经证明,如果每个代理都是一个有限自动机,并且其状态数小于轮数的指数,则迭代囚徒困境博弈中存在协作均衡。这本质上是一个有界的优化结果,其中界限是空间而不是计算速度。
Similar ideas have also surfaced recently in game theory, where there has been a shift from consideration of optimal decisions in games to a consideration of optimal decision-making programs. This leads to different results because it limits the ability of each agent to do unlimited simulation of the other, who is also doing unlimited simulation of the first, and so on. Even the requirement of computability makes a significant difference (Megiddo and Wigderson, 1986). Bounds on the complexity of players have also become a topic of intense interest. Papadimitriou and Yannakakis (1994) have shown that a collaborative equilibrium exists for the iterated Prisoner’s Dilemma game if each agent is a finite automaton with a number of states that is less than exponential in the number of rounds. This is essentially a bounded optimality result, where the bound is on space rather than speed of computation.
哲学对理性的定义也经历了逐渐的演变。从行为功利主义(个人行为的理性)转向规则功利主义(普遍的理性)。行动的政策。Cherniak (1986) 和 Harman (1983) 广泛讨论了政策对有限代理可行的要求。在“道德急救手册”(Dennett, 1986) 中可以找到与有界最优概念通常一致的哲学提议。Dennett 明确讨论了在可行决策程序空间内达到最优的想法,并以哲学系的博士录取程序为例。他指出,有界最优录取程序可能有些混乱,可能没有明显的“最优”标志——事实上,招生委员会可能会继续对其进行修改,因为有界最优系统可能无法识别其自身的有界最优性。
Philosophy has also seen a gradual evolution in the definition of rationality. There has been a shift from consideration of act utilitarianism—the rationality of individual acts—to rule utilitarianism, or the rationality of general policies for acting. The requirement that policies be feasible for limited agents was discussed extensively by Cherniak (1986) and Harman (1983). A philosophical proposal generally consistent with the notion of bounded optimality can be found in the “Moral First Aid Manual” (Dennett, 1986). Dennett explicitly discusses the idea of reaching an optimum within the space of feasible decision procedures, using as an example the Ph.D. admissions procedure of a philosophy department. He points out that the bounded optimal admissions procedure may be somewhat messy and may have no obvious hallmark of “optimality”—in fact, the admissions committee may continue to tinker with it since bounded optimal systems may have no way to recognize their own bounded optimality.
在与 Devika Subramanian 合作的过程中,有界最优性的一般思想被置于正式环境中,以便人们可以开始推导出有界最优程序的严格结果。这涉及在代理、程序、机器、环境和时间之间建立完全指定的关系。我们发现这本身就是一个非常有价值的练习。例如,“民间人工智能”的“实时环境”和“截止日期”概念最终的定义与我们最初想象的完全不同。在此基础上,研究了一种非常简单的机器架构,其中程序由固定执行时间和决策质量的决策程序组成。在“随机截止日期”环境中,事实证明,通过按顺序运行多个程序直到中断所获得的效用通常高于任何单个决策程序所能获得的效用。也就是说,在开始进行更复杂的计算之前,最好先准备一个“快速而粗略”的答案,以防后者无法及时完成。
In work with Devika Subramanian, the general idea of bounded optimality has been placed in a formal setting so that one can begin to derive rigorous results on bounded optimal programs. This involves setting up completely specified relationships among agents, programs, machines, environments, and time. We found this to be a very valuable exercise in itself. For example, the “folk AI” notions of “real-time environments” and “deadlines” ended up with definitions rather different than those we had initially imagined. From this foundation, a very simple machine architecture was investigated in which the program consists of decision procedures of fixed execution time and decision quality. In a “stochastic deadline” environment, it turns out that the utility attained by running several procedures in sequence until interrupted is often higher than that attainable by any single decision procedure. That is, it is often better first to prepare a “quick and dirty” answer before embarking on more involved calculations in case the latter do not finish in time.
这些结果的有趣之处在于,除了它们作为有界最优性的非平凡证明的价值之外,它们还以一种简单的方式展示了我认为有界最优代理的主要特征:有限机器内追求最优的压力会导致更复杂的程序结构。直观地说,在复杂环境中进行有效的决策需要一种提供各种可能的计算选项的软件架构,以便在大多数情况下,代理至少有一些可用的计算,从而显著提高决策质量。
The interesting aspect of these results, beyond their value as a demonstration of nontrivial proofs of bounded optimality, is that they exhibit in a simple way what I believe to be a major feature of bounded optimal agents: the fact that the pressure towards optimality within a finite machine results in more complex program structures. Intuitively, efficient decision-making in a complex environment requires a software architecture that offers a wide variety of possible computational options, so that in most situations the agent has at least some computations available that provide a significant increase in decision quality.
对上述有界最优性基本模型的一个可能的反对意见是,解决方案对于环境或机器的微小变化并不稳健。这反过来会导致分析复杂系统设计时遇到困难。理论计算机科学在描述算法的运行时间时面临同样的问题,因为计算步骤和描述指令集在最佳算法上会给出同样脆弱的结果。O() 符号就是为处理这个问题而开发的,它提供了一种更稳健的方式来描述复杂度,而不受机器速度和实现细节的影响。这种稳健性对于允许复杂性结果累积发展也是必不可少的。在 Russell 和 Subramanian (1995) 中,相应的概念是渐近有界最优性 (ABO)。与经典复杂性一样,我们可以定义平均情况和最坏情况 ABO,其中“情况”在这里表示环境。例如,最坏情况 ABO 定义如下:
One possible objection to the basic model of bounded optimality outlined above is that solutions are not robust with respect to small variations in the environment or the machine. This in turn would lead to difficulties in analysing complex system designs. Theoretical computer science faced the same problem in describing the running time of algorithms, because counting steps and describing instruction sets exactly gives the same kind of fragile results on optimal algorithms. The O() notation was developed to deal with this and provides a much more robust way to describe complexity that is independent of machine speeds and implementation details. This robustness is also essential in allowing complexity results to develop cumulatively. In Russell and Subramanian (1995), the corresponding notion is asymptotic bounded optimality (ABO). As with classical complexity, we can define both average-case and worst-case ABO, where “case” here means the environment. For example, worst-case ABO is defined as follows:
最坏情况渐近有界最优性。 代理程序 l 在时间上(或空间上)是 E 中M 上的最坏情况 ABO,当且仅当
Worst-case asymptotic bounded optimality. An agent program l is timewise (or space-wise) worst-case ABO in E on M iff
其中 kM 表示速度提高了 k 倍(或内存增加了 k 倍)的 M 版本,而 V * ( f, E , U, n )是对于E中所有复杂度为 n 的E 而言,V ( f, E , U )的最小值。
where kM denotes a version of M speeded up by a factor k (or with k times more memory) and V*(f, E, U, n) is the minimum value of V(f, E, U) for all E in E of complexity n.
用英语来说,这意味着如果该程序只需要一台更快(更大)的机器来实现与所有环境中的任何其他程序一样好的最坏情况行为,那么该程序基本上是沿着正确的路线运行的。
In English, this means that the program is basically along the right lines if it just needs a faster (larger) machine to have worst-case behaviour as good as that of any other program in all environments.
对有界优化理念的另一个可能的反对意见是,它只是将元级理性的棘手计算负担从代理的元级转移到设计者的对象级。当然,有人可能会说,设计师现在必须离线解决所有在线时难以解决的元级优化问题。这种说法并非毫无道理——事实上,如果代理设计问题变得简单,那将是令人惊讶的。然而,这两个问题之间存在显著差异,因为代理设计者可能正在为整个环境类创建代理,而假定的元级代理正在特定环境中工作。通过考虑排序算法的例子可以看出,这可以使设计师的问题更容易解决。对一万亿个元素的列表进行排序可能确实非常困难,但设计一个渐近最优的排序算法相对容易。事实上,这两项任务的难度是无关的。对于 BO 和 ABO 设计来说,这种不相关性仍然成立,但是 ABO 定义使其更加清晰。
Another possible objection to the idea of bounded optimality is that it simply shifts the intractable computational burden of metalevel rationality from the agent’s metalevel to the designer’s object level. Surely, one might argue, the designer now has to solve offline all the metalevel optimization problems that were intractable when online. This argument is not without merit—indeed, it would be surprising if the agent design problem turns out to be easy. There is however, a significant difference between the two problems, in that the agent designer is presumably creating an agent for an entire class of environments, whereas the putative metalevel agent is working in a specific environment. That this can make the problem easier for the designer can be seen by considering the example of sorting algorithms. It may be very difficult indeed to sort a list of a trillion elements, but it is relatively easy to design an asymptotically optimal algorithm for sorting. In fact, the difficulties of the two tasks are unrelated. The unrelatedness would still hold for BO as well as ABO design, but the ABO definitions make it a good deal clearer.
可以很容易地证明,最坏情况 ABO 是渐近最优算法的泛化,只需构建一个“经典环境”,经典算法在其中运行,并且如果输出正确,则算法行为的效用是运行时的递减正函数,否则为零。更一般环境中的代理可能需要权衡输出质量和时间,随着时间的推移生成多个输出,等等。为了说明 ABO 是一种有用的抽象,可以证明在某些限制下,可以使用 Russell 和 Zilberstein (1991) 的加倍构造构建通用ABO 程序,这些程序对于效用函数的任何时间变化都是 ABO。下面讨论了有界优化研究的进一步方向。
It can be shown easily that worst-case ABO is a generalization of asymptotically optimal algorithms, simply by constructing a “classical environment” in which classical algorithms operate and in which the utility of the algorithm’s behaviour is a decreasing positive function of runtime if the output is correct and zero otherwise. Agents in more general environments may need to trade off output quality for time, generate multiple outputs over time, and so on. As an illustration of how ABO is a useful abstraction, one can show that under certain restrictions one can construct universal ABO programs that are ABO for any time variation in the utility function, using the doubling construction from Russell and Zilberstein (1991). Further directions for bounded optimality research are discussed below.
本节将介绍一些研究活动,我希望这些研究活动有助于将有界最优性转变为 AI 系统设计的创新工具。不过,首先,我将介绍需要完成的计算合理系统方面的工作,以丰富代理程序的空间。
This section describes some of the research activities that will, I hope, help to turn bounded optimality into a creative tool for AI system design. First, however, I shall describe work on calculatively rational systems that needs to be done in order to enrich the space of agent programs.
如上所述,合理代理的正确设计取决于任务环境——“物理”环境和环境历史的性能衡量标准。可以定义任务环境的一些基本属性,这些属性与问题的复杂性一起,导致对相应合理代理设计有可识别的要求(Russell 和 Norvig,1995 年,第 2 章)。主要属性是环境是完全可观察还是部分可观察,是确定性还是随机性,是静态(即,除了代理行动时不会改变)还是动态,以及是离散的还是连续的。尽管这些区别很粗略,但它们为人工智能的基础研究制定了议程。通过分析和解决每个子案例并产生具有所需属性的计算合理机制,理论家可以生产出相当于砖块、横梁和砂浆的人工智能,人工智能建筑师可以使用它们建造相当于大教堂的建筑。不幸的是,许多基本组件目前都缺失了。有些则非常脆弱,无法扩展,几乎无法支撑自身的重量。这为具有深远影响的研究提供了许多机会。
As mentioned above, the correct design for a rational agent depends on the task environment—the “physical” environment and the performance measure on environment histories. It is possible to define some basic properties of task environments that, together with the complexity of the problem, lead to identifiable requirements on the corresponding rational agent designs (Russell and Norvig, 1995, Chapter 2). The principal properties are whether the environment is fully observable or partially observable, whether it is deterministic or stochastic, whether it is static (i.e., does not change except when the agent acts) or dynamic, and whether it is discrete or continuous. Although crude, these distinctions serve to lay out an agenda for basic research in AI. By analysing and solving each subcase and producing calculatively rational mechanisms with the required properties, theoreticians can produce the AI equivalent of bricks, beams, and mortar with which AI architects can build the equivalent of cathedrals. Unfortunately, many of the basic components are currently missing. Others are so fragile and non-scalable as to be barely able to support their own weight. This presents many opportunities for research of far-reaching impact.
基于目标的代理设计的逻辑主义传统以保证计划的创建和执行为基础,牢牢扎根于完全可观察、确定性、静态和离散的任务环境中。(此外,任务通常被指定为逻辑定义的目标,而不是一般效用函数。)这意味着代理不需要保持任何内部状态,甚至可以在不使用感知的情况下执行计划。
The logicist tradition of goal-based agent design, based on the creation and execution of guaranteed plans, is firmly anchored in fully observable, deterministic, static, and discrete task environments. (Furthermore, tasks are usually specified as logically defined goals rather than general utility functions.) This means that agents need keep no internal state and can even execute plans without the use of perception.
随机、部分可观测环境中的最优行动理论属于POMDP(部分可观测马尔可夫决策问题)的范畴,该类问题最早在 Sondik(1971 年)的著作中提出,但直到最近(Cassandra 等人,1994 年)才在人工智能领域中被完全知晓。同样,人工智能领域在动态环境(需要实时决策)或连续环境(主要是基于几何的机器人技术领域)方面开展的基础性工作也非常少。由于大多数现实世界的应用都是部分可观测、非确定性、动态和连续的,因此缺乏重点关注有些令人惊讶。
The theory of optimal action in stochastic, partially observable environments goes under the heading of POMDPs (Partially Observable Markov Decision Problems), a class of problems first addressed in the work of Sondik (1971) but almost completely unknown in AI until recently (Cassandra et al., 1994). Similarly, very little work of a fundamental nature has been done in AI on dynamic environments, which require real-time decision making, or on continuous environments, which have been largely the province of geometry-based robotics. Since most real-world applications are partially observable, nondeterministic, dynamic, and continuous, the lack of emphasis is somewhat surprising.
然而,还有几块新砖正在建造中。例如,动态概率网络 (DPN)(Dean 和 Kanazawa,1989)提供了一种机制,用于维持对动态、部分可观察、非确定性环境的当前状态的信念,并预测动作的影响。此外,计算机视觉系统速度和准确性的快速提高使得与连续物理环境的交互更加实用。特别是,卡尔曼滤波(Kalman,1960)的应用,一种广泛用于控制理论的技术,可以实现对移动物体的稳健和高效跟踪;DPN 扩展了卡尔曼滤波,以允许更一般地表示世界状态。强化学习与用于连续函数表示(例如神经网络)的归纳学习方法一起,允许从连续、非确定性环境中的延迟奖励中学习。最近,Parr 和 Russell(1995)等人在将强化学习应用于部分可观察环境方面取得了一些成功。最后,具有隐藏变量的静态和动态概率网络的学习方法(即部分可观察的环境)可能获得必要的环境模型(Lauritzen,1995;Russell 等,1995)。
There are, however, several new bricks under construction. For example, dynamic probabilistic networks (DPNs) (Dean and Kanazawa, 1989) provide a mechanism to maintain beliefs about the current state of a dynamic, partially observable, nondeterministic environment, and to project forward the effects of actions. Also, the rapid improvement in the speed and accuracy of computer vision systems has made interfacing with continuous physical environments more practical. In particular, the application of Kalman filtering (Kalman, 1960), a widely used technique in control theory, allows robust and efficient tracking of moving objects; DPNs extend Kalman filtering to allow more general representations of world state. Reinforcement learning, together with inductive learning methods for continuous function representations such as neural networks, allow learning from delayed rewards in continuous, nondeterministic environments. Recently, Parr and Russell (1995), among others, have had some success in applying reinforcement learning to partially observable environments. Finally, learning methods for static and dynamic probabilistic networks with hidden variables (i.e., for partially observable environments) may make it possible to acquire the necessary environment models (Lauritzen, 1995; Russell et al., 1995).
贝叶斯自动出租车(又名 BATmobile)项目(Forbes 等人,1995 年)试图将所有这些新模块结合起来,解决一个有趣的应用问题,即在高速公路上驾驶汽车。从技术上讲,这可以视为 POMDP,因为环境包含不可观察的相关变量(例如左侧的沃尔沃是否打算向右变道),并且其他车辆的行为和个人行为的影响无法完全预测。在 POMDP 中,最佳决策取决于整个状态变量集的联合概率分布。事实证明,实时视觉算法、卡尔曼滤波和动态概率网络的组合可以在观察高速公路上的车流时保持所需的分布。BATmobile 目前使用手工编码的决策树在此基础上做出决策,并且在我们的模拟器上是一个相当安全的驾驶员(尽管可能远非最佳)。我们目前正在尝试前瞻方法来做出近似理性的决策,以及监督学习和强化学习方法。
The Bayesian Automated Taxi (a.k.a. BATmobile) project (Forbes et al., 1995) is an attempt to combine all these new bricks to solve an interesting application problem, namely driving a car on a freeway. Technically, this can be viewed as a POMDP because the environment contains relevant variables (such as whether or not the Volvo on your left is intending to change lanes to the right) that are not observable, and because the behaviour of other vehicles and the effects of one’s own actions are not exactly predictable. In a POMDP, the optimal decision depends on the joint probability distribution over the entire set of state variables. It turns out that a combination of real-time vision algorithms, Kalman filtering, and dynamic probabilistic networks can maintain the required distribution when observing a stream of traffic on a freeway. The BATmobile currently uses a hand-coded decision tree to make decisions on this basis, and is a fairly safe driver (although probably far from optimal) on our simulator. We are currently experimenting with lookahead methods to make approximately rational decisions, as well as supervised learning and reinforcement learning methods.
除了扩大人工智能应用的范围之外,在不确定情况下进行规划的新工具还大大增加了元推理发挥作用的机会。对于逻辑规划器来说,一个计划要么有效,要么无效;事实证明,很难找到启发式方法来衡量一个不能保证成功的逻辑计划的“优劣”,或者估计一个抽象的逻辑计划有成功的具体实例的可能性。这意味着很难确定可能具有高价值的计划制定步骤。相比之下,设计用于处理不确定性和效用的规划器具有关于成功可能性的内置信息,并且存在从无望到完美的计划的连续性。让元推理在这样的系统中发挥作用是当务之急。应用那些在扩大传统规划器范围方面非常有效的方法(如偏序规划和抽象)也很重要。
As well as extending the scope of AI applications, new bricks for planning under uncertainty significantly increase the opportunity for metareasoning to make a difference. With logical planners, a plan either does or does not work; it has proved very difficult to find heuristics to measure the “goodness” of a logical plan that does not guarantee success, or to estimate the likelihood that an abstract logical plan will have a successful concrete instance. This means that it is very hard to identify plan elaboration steps that are likely to have high value. In contrast, planners designed to handle uncertainty and utility have built-in information about the likelihood of success and there is a continuum from hopeless to perfect plans. Getting metareasoning to work for such systems is a high priority. It is also important to apply those methods such as partial-order planning and abstraction that have been so effective in extending the reach of classical planners.
正在进行的有界最优性研究旨在将(Russell and Subramanian,1995)的初步结果扩展到更有趣的代理设计。在本节中,我将概述一些设计维度以及建立有界最优性结果所涉及的问题。
Ongoing research on bounded optimality aims to extend the initial results of (Russell and Subramanian, 1995) to more interesting agent designs. In this section, I will sketch some design dimensions and the issues involved in establishing bounded optimality results.
要遵循的一般方案是定义一个虚拟机M,它运行类 ℒ M中的程序。通常,程序将具有在某个子类之间共享的“固定部分”和特定于单个程序的“可变部分”。然后对同一台机器的不同子类中的最佳程序进行比较。例如,假设M是一台能够运行任何前馈神经网络的机器。ℒ M由所有这样的网络组成,我们可能有兴趣比较由不同网络拓扑定义的子类,而在每个子类中,各个程序在网络链接上的权重有所不同。因此,机器和程序之间的界限在某种程度上取决于设计者希望考虑的比较范围。
The general scheme to be followed involves defining a virtual machine M that runs programs from a class ℒM. Typically, programs will have a “fixed part” that is shared across some subclass and a “variable part” that is specific to the individual program. Then comparisons are made between the best programs in different subclasses for the same machine. For example, suppose M is a machine capable of running any feedforward neural network. ℒM consists of all such networks, and we might be interested in comparing the subclasses defined by different network topologies, while within each subclass individual programs differ in the weights on the links of the network. Thus, the boundary between machine and program depends to some extent on the range of comparisons that the designer wishes to consider.
在最一般的分析层面上,该方法现在非常简单:选择一台机器,选择一个在机器上运行的程序,然后将生成的代理转储到一类环境E中。性能最佳的程序对于E中的M是有限的最优的。例如,M是带有 C 编译器的 IBM PC;ℒ M由一定大小的 C 程序组成;环境由一群人类国际象棋对手组成;性能指标是达到的国际象棋等级;有界最优程序是等级最高的程序。
At the most general level of analysis, the methodology is now quite straightforward: choose a machine, choose a program that runs on the machine, then dump the resulting agent into a class of environments E. The program with the best performance is bounded optimal for M in E. For example, M is an IBM PC with a C compiler; ℒM consists of C programs up to a certain size; the environment consists of a population of human chess opponents; the performance measure is the chess rating achieved; the bounded optimal program is the one with the highest rating.
这种相当生硬和缺乏启发性的方法无疑在许多参与国际象棋程序构建的人身上都出现过。如前所述,这个问题极难解决,而且一旦找到解决方案,就会非常特定于特定领域。问题在于为有界最优性定义一个研究议程,以提供更多的指导和通用性。这可以通过利用问题定义中的结构(特别是时间和内容的正交性)以及使用更复杂的代理设计(特别是那些包含适应和优化机制的代理设计)来实现。通过这种方式,我们可以证明更一般的任务环境类别的有界最优性结果。
This rather blunt and unenlightening approach has no doubt occurred to many engaged in the construction of chess programs. As stated, the problem is ridiculously hard to solve and the solution, once found, would be very domain-specific. The problem is to define a research agenda for bounded optimality that provides a little more guidance and generality. This can be done by exploiting structure in the definition of the problem, in particular the orthogonality of time and content, and by using more sophisticated agent designs, particularly those that incorporate mechanisms for adaptation and optimization. In this way, we can prove bounded optimality results for more general classes of task environments.
8.7.2.1 优化机制 使用 组件层次结构的模块化设计通常被视为构建可靠复杂系统的唯一方法。组件满足某些行为规范并以明确定义的方式进行交互。为了生成复合有界最优设计,优化问题涉及为组件分配执行时间(Zilberstein 和 Russell,1996)或安排组件的执行顺序(Russell 和 Subramanian,1995)以最大化整体性能。如前面在通用 ABO 算法的讨论中所述,优化时间行为的技术与系统组件的内容大体上是正交的,因此可以单独优化。例如,考虑一个复合系统,该系统使用信念网络的任意时间推理算法作为其组件之一。如果学习算法提高了信念网络的准确性,推理组件的性能概况将得到改善,这将导致重新分配执行时间,从而保证提高整体系统性能。因此,诸如 Zilberstein 和 Russell (1996) 中的加倍构造和时间分配算法之类的技术可以看作是独立于领域的代理设计工具。它们可以实现有界的最优结果,而这些结果不依赖于环境类的特定时间方面。举一个简单的例子,我们可以证明某个国际象棋程序设计对于从闪电战到完整锦标赛的所有时间控制都是 ABO。
8.7.2.1 Mechanisms for optimization Modular design using a hierarchy of components is commonly seen as the only way to build reliable complex systems. The components fulfill certain behavioural specifications and interact in well-defined ways. To produce a composite bounded-optimal design, the optimizatian problem involves allocating execution time to components (Zilberstein and Russell, 1996) or arranging the order of execution of the components (Russell and Subramanian, 1995) to maximize overall performance. As illustrated earlier in the discussion of universal ABO algorithms, the techniques for optimizing temporal behaviour are largely orthogonal to the content of the system components, which can therefore be optimized separately. Consider, for example, a composite system that uses an anytime inference algorithm over a belief network as one of its components. If a learning algorithm improves the accuracy of the belief network, the performance profile of the inference component will improve, which will result in a reallocation of execution time that is guaranteed to improve overall system performance. Thus, techniques such as the doubling construction and the time allocation algorithm in Zilberstein and Russell (1996) can be seen as domain-independent tools for agent design. They enable bounded optimality results that do not depend on the specific temporal aspects of the environment class. As a simple example, we might prove that a certain chess program design is ABO for all time controls ranging from blitz to full tournament play.
迄今为止,最佳时间分配的结果假设了一个静态的离线优化过程,具有可预测的组件性能配置文件和组件之间的固定连接。我们可以想象出更微妙的设计,其中各个组件必须处理处理过程中意外缓慢或快速的进度以及对来自其他组件的信息不断变化的需求。这可能涉及在组件之间交换计算资源、建立新的接口等等。这更像 Wellman (1993) 设想的计算市场,而不是经典的子程序层次结构,并将在系统设计中提供有用的额外抽象级别。
The results obtained so far for optimal time allocation have assumed a static, offline optimization process with predictable component performance profiles and fixed connections among components. One can imagine far more subtle designs in which individual components must deal with unexpectedly slow or fast progress in processing and changing needs for information from other components. This might involve exchanging computational resources among components, establishing new interfaces, and so on. This is more reminiscent of a computational market, as envisaged by Wellman (1993), than of the classical subroutine hierarchies, and would offer a useful additional level of abstraction in system design.
8.7.2.2 适应机制 除了对代理的结构和时间行为进行组合优化之外,我们还可以使用学习方法来改进设计:
8.7.2.2 Mechanisms for adaptation In addition to combinatorial optimization of the structure and temporal behaviour of an agent, we can also use learning methods to improve the design:
据推测,代理架构可以整合所有这些学习机制。有界优化研究面临的问题之一是,当多个适应和优化机制同时运行时,如何证明收敛结果。“准静态”方法,即一种机制在另一种方法允许采取下一步之前达到收敛,在理论上似乎足够,但不太实用。
Presumably, an agent architecture can incorporate all these learning mechanisms. One of the issues to be faced by bounded optimality research is how to prove convergence results when several adaptation and optimization mechanisms are operating simultaneously. A “quasistatic” approach, in which one mechanism reaches convergence before the other method is allowed to take its next step, seems theoretically adequate but not very practical.
8.7.2.3 离线和在线机制在构建有界最优代理时, 可以区分离线和在线机制。离线构建机制本身不是代理的一部分,也不受有界最优性约束。假设C是为环境E类设计的离线机制。那么典型的定理会说C在特定环境E ∈ E中运行,并返回一个对E来说是 ABO 的代理设计— 即特定于环境的代理。
8.7.2.3 Offline and online mechanisms One can distinguish between offline and online mechanisms for constructing bounded-optimal agents. An offline construction mechanism is not itself part of the agent and is not the subject of bounded optimality constraints. Let C be an offline mechanism designed for a class of environments E. Then a typical theorem will say that C operates in a specific environment E ∈ E and returns an agent design that is ABO (say) for E—that is, an environment-specific agent.
在线情况下,机制C被视为代理的一部分。然后,一个典型的定理会说,对于所有E ∈ E ,代理都是 ABO 。如果所使用的性能度量与适应或优化机制的瞬时成本无关,则这两类定理本质上是相同的。另一方面,如果不能忽略成本(例如,如果快速学习的代理比达到相同性能水平但学习速度较慢的代理更受青睐),那么分析就会变得更加困难。为了获得稳健的结果,可能有必要为“经验效率”定义渐近等价性,就像在计算学习理论中所做的那样。
In the online case, the mechanism C is considered part of the agent. Then a typical theorem will say that the agent is ABO for all E ∈ E. If the performance measure used is indifferent to the transient cost of the adaptation or optimization mechanism, the two types of theorems are essentially the same. On the other hand, if the cost cannot be ignored—for example, if an agent that learns quickly is to be preferred to an agent that reaches the same level of performance but learns more slowly—then the analysis becomes more difficult. It may become necessary to define asymptotic equivalence for “experience efficiency” in order to obtain robust results, as is done in computational learning theory.
值得注意的是,人们可以很容易地在 ABO 框架中证明“终身学习”的价值。从 ABO 意义上讲,将固定比例的计算资源用于边做边学的代理不会比在某个时间点后停止学习的代理表现更差。如果仍有可能改进,终身学习的代理将始终受到青睐。
It is worth noting that one can easily prove the value of “lifelong learning” in the ABO framework. An agent that devotes a constant fraction of its computational resources to learning-while-doing cannot do worse, in the ABO sense, than an agent that ceases learning after some point. If some improvement is still possible, the lifelong learning agent will always be preferred.
8.7.2.4 固定和可变计算成本 当考虑代理设计“可变部分”的计算成本时,设计空间的另一个维度就出现了。当成本固定时,设计问题就大大简化了。再次考虑元级强化学习的任务,为了使事情具体化,让元级决策由一个Q 函数将计算状态和动作映射到值。进一步假设 Q 函数由神经网络表示。如果神经网络的拓扑结构是固定的,则空间中的所有 Q 函数都具有相同的执行时间。因此,标准 Q 学习过程使用的最优标准与有界最优一致,并且达到的平衡将是有界最优配置。4另一方面,如果网络拓扑在探索设计空间时发生变化,则不同 Q 函数的执行时间会有所不同。在这种情况下,标准 Q 学习过程不一定会收敛到有界最优配置。必须找到一种不同的适应机制,将时间的流逝及其对效用的影响考虑在内。
8.7.2.4 Fixed and variable computation costs Another dimension of design space emerges when one considers the computational cost of the “variable part” of the agent design. The design problem is simplified considerably when the cost is fixed. Consider again the task of metalevel reinforcement learning, and to make things concrete let the metalevel decision be made by a Q-function mapping from computational state and action to value. Suppose further that the Q-function is to be represented by a neural net. If the topology of the neural net is fixed, then all Q-functions in the space have the same execution time. Consequently, the optimality criterion used by the standard Q-learning process coincides with bounded optimality, and the equilibrium reached will be a bounded-optimal configuration.4 On the other hand, if the topology of the network is subject to alteration as the design space is explored, then the execution time of the different Q-functions varies. In this case, the standard Q-learning process will not necessarily converge to a bounded-optimal configuration. A different adaptation mechanism must be found that takes into account the passage of time and its effect on utility.
不管这个问题的解决方案是什么,重要的是,有限最优性的概念有助于区分那些会产生良好表现的适应机制和那些不会产生良好表现的适应机制。在更现实的环境中,从计算理性中衍生出来的适应机制将会失败,因为在这种情况下,代理人无法追求完美。
Whatever the solution to this problem turns out to be, the important point is that the notion of bounded optimality helps to distinguish adaptation mechanisms that will result in good performance from those that will not. Adaptation mechanisms derived from calculative rationality will fail in the more realistic setting where an agent cannot afford to aim for perfection.
8.7.2.5 完全可变的架构 到目前为止,讨论仅限于相当稳固的代理架构形式,其中适应范围仅限于特定的功能方面,例如元级 Q 函数。但是,代理通常必须处理比其自身复杂得多的环境,并且该环境会随时间在所有粒度级别上发生变化。代理内存大小的限制可能意味着需要几乎完全修改代理的心理结构才能实现高性能。例如,可以想象一个经历冬季和夏季循环的基于规则的简单代理可能不得不在冬季来临时丢弃其所有夏季规则,然后在第二年从头开始重新学习它们。这种情况可能会促使我们重新思考我们的一些代理架构和优化概念,并建议将代理程序视为具有不同数量的已编译和未编译知识以及归纳学习、遗忘和编译的内部过程的动态系统。
8.7.2.5 Fully variable architectures The discussion so far has been limited to fairly sedate forms of agent architecture in which the scope for adaptation is circumscribed to particular functional aspects such as metalevel Q-functions. However, an agent must in general deal with an environment that is far more complex than itself and that exhibits variation over time at all levels of granularity. Limits on the size of the agent’s memory may imply that almost complete revision of the agent’s mental structure is needed to achieve high performance. For example, one can imagine that a simple rule-based agent living through cycles of winter and summer may have to discard all of its summer rules as winter approaches, and then relearn them from scratch the following year. Such situations may engender a rethinking of some of our notions of agent architecture and optimality, and suggest a view of agent programs as dynamical systems with various amounts of compiled and uncompiled knowledge and internal processes of inductive learning, forgetting, and compilation.
8.7.2.6 走向人工智能系统的语法 有界优化研究似乎正在出现一种新方法,即将代理设计空间划分为“架构类别”,使得每个类别中的结构变化都足够有限。然后,可以通过类别内的分析优化或通过显示经验适应过程产生近似的 ABO 设计来获得 ABO 结果。一旦完成,就应该可以直接比较架构类别,也许可以确定一个类别相对于另一个类别的渐近优势。例如,在给定架构中包含适当的“宏运算符形成”或“贪婪元推理”能力可能会在非常复杂的环境中导致行为的改善——也就是说,不能通过将机器速度提高一个常数倍来弥补能力的缺失。这种情况下的核心工具工作将使用“无成本”结果,例如,将恒定比例的计算资源分配给学习或元推理不会对代理的 ABO 前景造成任何损害。
8.7.2.6 Towards a grammar of AI systems The approach that seems to be emerging for bounded optimality research is to divide up the space of agent designs into “architectural classes” such that in each class the structural variation is sufficiently limited. Then ABO results can be obtained either by analytical optimization within the class or by showing that an empirical adaptation process results in an approximately ABO design. Once this is done, it should be possible to compare architecture classes directly, perhaps to establish asymptotic dominance of one class over another. For example, it might be the case that the inclusion of an appropriate “macro-operator formation” or “greedy metareasoning” capability in a given architecture will result in an improvement in behaviour in the limit of very complex environments—that is, one cannot compensate for the exclusion of the capability by increasing the machine speed by a constant factor. A central tool in such work will be the use of “no-cost” results where, for example, the allocation of a constant fraction of computational resources to learning or metareasoning can do no harm to an agent’s ABO prospects.
让所有这些架构设备顺利协同工作是人工智能中一个尚未解决的重要问题,必须先解决这个问题,我们才能在理解这些更复杂的架构类别中的有界最优性方面取得进展。如果“架构设备”的概念能够足够具体,那么人工智能最终可能会开发出一种用于代理设计的语法,描述设备及其相互关系。随着语法的发展,随之而来的 ABO 优势结果也应该随之发展。
Getting all these architectural devices to work together smoothly is an important unsolved problem in AI and must be addressed before we can make progress on understanding bounded optimality within these more complex architectural classes. If the notion of “architectural device” can be made sufficiently concrete, then AI may eventually develop a grammar for agent designs, describing the devices and their interrelations. As the grammar develops, so should the accompanying ABO dominance results.
我已概述了一些正式的人工智能研究方向,这些研究以有界最优性作为人工智能系统的理想属性。这种对人工智能的看法似乎是从优化动作或计算到优化程序这一不可避免的哲学“转变”的逻辑结果。我曾提出,这种方法应该允许理论和实践人工智能研究之间产生协同作用,而这种协同作用是其他正式框架所无法提供的。同样,我相信它是创造智能这一非正式目标令人满意的正式对应物。特别是,它完全符合我们对真实智能代理需要复杂结构、相对较小的思维在广阔世界中面临的资源限制的重要性以及进化作为设计优化过程的运作的直觉。人们还可以说,有界最优性研究可能会更好地满足那些希望模仿人类智能的人的需求,因为它考虑到了计算资源的限制,而这些限制可能是人类表现出的大多数令人遗憾的偏离完美理性的原因。
I have outlined some directions for formally grounded AI research based on bounded optimality as the desired property of AI systems. This perspective on AI seems to be a logical consequence of the inevitable philosophical “move” from optimization over actions or computations to optimization over programs. I have suggested that such an approach should allow synergy between theoretical and practical AI research of a kind not afforded by other formal frameworks. In the same vein, I believe it is a satisfactory formal counterpart of the informal goal of creating intelligence. In particular, it is entirely consistent with our intuitions about the need for complex structure in real intelligent agents, the importance of the resource limitations faced by relatively tiny minds in large worlds, and the operation of evolution as a design optimization process. One can also argue that bounded optimality research is likely to satisfy better the needs of those who wish to emulate human intelligence, because it takes into account the limitations on computational resources that are presumably responsible for most of the regrettable deviation from perfect rationality exhibited by humans.
当然,有界最优性和它的近亲渐近最优性只不过是人们希望系统满足的正式定义的属性。现在判断 ABO 是否会为人工智能做与渐近复杂性为理论计算机科学所做的相同的工作还为时过早。设计中的创造力仍然是人工智能研究人员的特权。然而,有可能将设计过程系统化,并自动化使系统适应其计算资源和环境需求的过程。有界最优性的概念提供了一种确保适应过程“正确”的方法。
Bounded optimality and its asymptotic cousin are, of course, nothing but formally defined properties that one may want systems to satisfy. It is too early to tell whether ABO will do the same kind of work for AI that asymptotic complexity has done for theoretical computer science. Creativity in design is still the prerogative of AI researchers. It may, however be possible to systematize the design process somewhat and to automate the process of adapting a system to its computational resources and the demands of the environment. The concept of bounded optimality provides a way to make sure the adaptation process is “correct”.
我希望通过这些研究,最终能够开发出概念和数学工具来回答一些关于智能的基本问题。例如,为什么复杂的智能系统(似乎)具有声明性知识结构,并对其进行明确推理?这是将人工智能与其他代理设计学科区分开来的一个基本假设,但答案仍然未知。事实上,罗德·布鲁克斯、休伯特·德雷福斯和其他人断然否认了这一假设。很明显,需要某种类似有界最优代理设计理论的东西来回答这个问题。
My hope is that with these kinds of investigations, it will eventually be possible to develop the conceptual and mathematical tools to answer some basic questions about intelligence. For example, why do complex intelligent systems (appear to) have declarative knowledge structures over which they reason explicitly? This has been a fundamental assumption that distinguishes AI from other disciplines for agent design, yet the answer is still unknown. Indeed, Rod Brooks, Hubert Dreyfus, and others flatly deny the assumption. What is clear is that it will need something like a theory of bounded optimal agent design to answer this question.
我在这里讨论的大多数代理设计特征,包括使用陈述性知识,都是在“首先构建计算理性代理,然后加速它们”的标准方法中构思出来的。然而,人们有理由怀疑这种方法是否能让人工智能界发现通用智能所需的所有设计特征。原因是,任何可以想象的计算机都无法接近完美理性,即使是在中等复杂的环境中。完美理性,如果你愿意的话,是智能代理的“牛顿”定义,而现实世界是粒子加速器。很可能基于计算理性设计的改进的代理甚至无法达到在给定底层计算资源的情况下可能达到的性能水平。出于这个原因,我认为必须不要忽视那些乍一看似乎不符合“经典”计算理性框架的代理设计想法。相反,人们必须尝试了解相应建筑类别中有限的最优配置的潜力,并看看是否可以设计适当的适应机制来帮助实现这些配置。
Most of the agent design features that I have discussed here, including the use of declarative knowledge, have been conceived within the standard methodology of “first build calculatively rational agents and then speed them up”. Yet one can legitimately doubt that this methodology will enable the AI community to discover all the design features needed for general intelligence. The reason is that no conceivable computer will ever be remotely close to approximating perfect rationality for even moderately complex environments. Perfect rationality is, if you like, a “Newtonian” definition for intelligent agents whereas the real world is a particle accelerator. It may well be the case that agents based on improvements to calculatively rational designs are not even close to achieving the level of performance that is potentially achievable given the underlying computational resources. For this reason, I believe it is imperative not to dismiss ideas for agent designs that do not seem at first glance to fit into the “classical” calculatively rational framework. Instead, one must attempt to understand the potential of the bounded optimal configurations within the corresponding architectural class, and to see if one can design the appropriate adaptation mechanisms that might help in realizing these configurations.
正如上一节所述,在制作更通用、更强大的“砖块”以构建更现实环境的人工智能系统方面还有很多工作要做,这些工作将为实现有界最优提供更多空间。从某种意义上说,在这个概念下,人工智能研究现在和它应该一直都是一样。
As mentioned in the previous section, there is also plenty of work to do in the area of making more general and more robust “bricks” from which to construct AI systems for more realistic environments, and such work will provide added scope for the achievement of bounded optimality. In a sense, under this conception AI research is the same now as it always should have been.
1.我认为这是一个非常积极的发展。人工智能是一个由其问题而不是其方法定义的领域。其主要见解——其中包括为决策服务而学习、使用和汇编显性知识——当然可以经受住来自其他领域的新方法的涌入。当其他领域同时接受从人工智能中获得的见解时,情况尤其如此。
1. I view this as a very positive development. AI is a field defined by its problems, not its methods. Its principal insights—among them the learning, use, and compilation of explicit knowledge in the service of decision making—can certainly withstand the influx of new methods from other fields. This is especially true when other fields are simultaneously embracing the insights derived within AI.
2.这一决定是在计算机科学界正确理解计算难解性问题之前做出的,这或许并非巧合。
2. Perhaps not coincidentally, this decision was taken before the question of computational intractability was properly understood in computer science.
3. Doyle 和 Patil (1991) 提出了“推理的理性管理”这一理念。表示系统“应设计为提供成本和质量各异的广泛服务”,并应考虑“系统用户所感知的[计算]成本和收益”。也就是说,他们提出了一种基于理性元推理的解决方案,如第 8.5 节所述。
3. Doyle and Patil (1991) propose instead the idea of “rational management of inference”. Representation systems “should be designed to offer a broad mix of services varying in cost and quality” and should take into account “the costs and benefits [of computations] as perceived by the system’s user”. That is, they suggest a solution based on rational metareasoning, as discussed in Section 8.5.
4. Horvitz 和 Breese (1990) 也做出了类似的观察,他们研究的是对象级别受到严格限制的情况,因此元级别决策问题可以在恒定时间内得到解决。
4. A similar observation was made by Horvitz and Breese (1990) for cases where the object level is so restricted that the metalevel decision problem can be solved in constant time.
杰瑞·A·福多尔
Jerry A. Fodor
1983
1983
我们现在想问的问题可以这样表述:是否有可以合理地假设跨越认知领域的心理过程?如果有,是否有理由假设这些过程是由非模块化(例如,信息未封装)机制推动的?
[T]he questions we now want to ask can be put like this: Are there psychological processes that can plausibly be assumed to cut across cognitive domains? And, if there are, is there reason to suppose that such processes are subserved by nonmodular (e.g., informationally unencapsulated) mechanisms?
我认为,第一个问题的答案相当清楚。即使输入系统是特定领域的,也一定存在一些非特定领域的认知机制。该论点的一般形式至少可以追溯到亚里士多德:输入系统提供的表示必须在某个地方进行交互,而影响该交互的计算机制必须能够访问来自多个认知领域的信息。考虑一下:
The answer to the first of these questions is, I suppose, reasonably clear. Even if input systems are domain specific, there must be some cognitive mechanisms that are not. The general form of the argument goes back at least to Aristotle: the representations that input systems deliver have to interface somewhere, and the computational mechanisms that effect the interface must ipso facto have access to information from more than one cognitive domain. Consider:
(a) 我们反复区分了输入系统计算的内容和有机体(有意识或潜意识地)相信的内容。这种区分的部分意义在于,输入系统是信息封装的,通常根据有机体所掌握的有关远端布局的较少信息来计算远端布局的表征。这些表征需要根据背景知识(例如,记忆中的信息)和其他领域输入分析的同时结果进行修正(参见亚里士多德关于“常识”的论述)。将得出这种修正表征的过程称为“知觉信念的固着”。大致来说,我们可以假设影响这一过程的机制是这样工作的:它们同时查看各种输入系统提供的表征和当前记忆中的信息,并根据这些不同类型的数据得出关于世界应该如何的最佳(即最佳可用)假设。1 但是,如果存在固定知觉信念的机制,并且它们以类似的方式工作,那么这些机制就不是特定于领域的。事实上,拥有它们的目的恰恰是为了确保,只要有可能,有机体的信仰就由它所能接触到的所有信息决定,不管这些信息来自哪个认知领域。
(a) We have repeatedly distinguished between what the input systems compute and what the organism (consciously or subdoxastically) believes. Part of the point of this distinction is that input systems, being informationally encapsulated, typically compute representations of the distal layout on the basis of less information about the distal layout than the organism has available. Such representations want correction in light of background knowledge (e.g., information in memory) and of the simultaneous results of input analysis in other domains (see Aristotle on the ‘common sense’). Call the process of arriving at such corrected representations “the fixation of perceptual belief.” To a first approximation, we can assume that the mechanisms that effect this process work like this: they look simultaneously at the representations delivered by the various input systems and at the information currently in memory, and they arrive at a best (i.e., best available) hypothesis about how the world must be, given these various sorts of data.1 But if there are mechanisms that fix perceptual belief, and if they work in anything like this way, then these mechanisms are not domain specific. Indeed, the point of having them is precisely to ensure that, wherever possible, what the organism believes is determined by all the information it has access to, regardless of which cognitive domains this information is drawn from.
(b) 我们使用语言(除其他外)来传达我们对世界的看法。但只有当介导言语产生的机制能够了解我们所看到(或听到、记住或想到)的世界是什么样子时,这种语言的使用才有可能。由于根据假设,此类机制会影响垂直能力之间的接口,因此它们本身不能是特定领域的。更确切地说,它们至少必须比垂直能力更不特定于领域。2
(b) We use language (inter alia) to communicate our views on how the world is. But this use of language is possible only if the mechanisms that mediate the production of speech have access to what we see (or hear, or remember, or think) that the world is like. Since, by assumption, such mechanisms effect an interface among vertical faculties, they cannot themselves be domain specific. More precisely, they must at least be less domain specific than the vertical faculties are.2
(c) 我们假设,输入系统“不可渗透”的一个方面是它们对生物体的效用不敏感。鉴于世界并不总是如我们所愿,这一假设在一定程度上解释了感知的真实性。然而,如果我们要使用输入系统提供的信息来确定我们应该如何行动,那么感知和效用之间的接口必须在某个地方发生。(决策理论在所有意图和目的上都是这种接口结构的模型。要点大致是,通过要求在感知整合之后(而不是期间)与效用的交互,可以避免一厢情愿的想法。)因此,再次强调,寓意似乎是必须有一些机制跨越输入系统建立的领域。
(c) One aspect of the ‘impenetrability’of the input systems is, we assumed, their insensitivity to the utilities of the organism. This assumption was required in part to explain the veridicality of perception given that the world doesn’t always prove to be the way that we would prefer it to be. However, an interface between perception and utilities must take place somewhere if we are to use the information that input systems deliver in order to determine how we ought to act. (Decision theories are, to all intents and purposes, models of the structure of this interface. The point is, roughly, that wishful seeing is, avoided by requiring interactions with utilities to occur after—not during—perceptual integration.) So, again, the moral seems to be that there must be some mechanisms which cross the domains that input systems establish.
出于这些和其他类似的原因,我假设一定存在相对非宗派的(即特定领域的)心理系统,这些系统除其他外,还用于利用输入系统提供的信息。按照传统,我将这些系统称为“中心”系统,并且我将假设,当人们在理论之前谈论诸如思考和解决问题之类的心理过程时,他们想到的就是这些系统的运作。中心系统在某种意义上可能是领域特定的——我们将在讨论“认知有界性”问题时考虑这一点——但至少它们不像输入系统那样是领域特定的。关于中心系统的有趣问题是,由于它们是非宗派的,它们在其他方面是否也是非模块化的。也就是说,中心系统是否未能表现出导致我们将输入系统视为自然类型的一系列属性…… 3
For these and other similar reasons, I assume that there must be relatively nondenominational (i.e., domain-inspecific) psychological systems which operate, inter alia, to exploit the information that input systems provide. Following the tradition, I shall call these “central” systems, and I will assume that it is the operation of these sorts of systems that people have in mind when they talk, pretheoretically, of such mental processes as thought and problem-solving. Central systems may be domain specific in some sense—we will consider this when we get to the issues about ‘epistemic boundedness’—but at least they aren’t domain specific in the way that input systems are. The interesting question about the central systems is whether, being nondenominational, they are also nonmodular in other respects as well. That is, whether the central systems fail to exhibit the galaxy of properties that lead us to think of the input systems as a natural kind…3
简而言之,我的论点是这样的:我们已经看到,输入系统的许多典型特征或多或少直接是其信息封装的产物。相比之下,我认为中央系统在许多重要方面都是非封装的,而且主要由于这个原因,它们不能被视为模块化。请注意,我不是要为同义反复辩护。从逻辑上讲,一个非领域特定系统完全有可能被封装。粗略地说,领域特定性与设备提供答案的问题范围有关(它为其计算分析的输入范围);而封装与设备在决定提供什么答案时参考的信息范围有关。因此,一个系统可以是领域特定但非封装的(它回答相对狭窄的问题范围,因此它使用它所知道的一切);一个系统可以是非宗派的,但又是封装的(它会对任何问题给出一些答案;但它的答案是凭空而来的——也就是说,参考的不是所有相关信息)。简而言之,如果只有特定领域的系统才是封装的,那么这个事实就很有趣。也许不用说,我不会证明这个假定的事实。然而,我即将探索它。
Briefly, my argument is going to be this: we have seen that much of what is typical of the input systems is more or less directly a product of their informational encapsulation. By contrast, I’ll claim that central systems are, in important respects, unencapsulated, and that it is primarily for this reason that they are not plausibly viewed as modular. Notice that I am not going to be arguing for a tautology. It is perfectly possible, in point of logic, that a system which is not domain specific might nevertheless be encapsulated. Roughly, domain specificity has to do with the range of questions for which a device provides answers (the range of inputs for which it computes analyses); whereas encapsulation has to do with the range of information that the device consults in deciding what answers to provide. A system could thus be domain specific but unencapsulated (it answers a relatively narrow range of questions put in doing so it uses whatever it knows); and a system could be nondenominational but encapsulated (it will give some answer to any question; but it gives its answers off the top of its head—i.e., by reference to less than all the relevant information). If, in short, it is true that only domain-specific systems are encapsulated, then that truth is interesting. Perhaps it goes without saying that I am not about to demonstrate this putative truth. I am, however, about to explore it.
这就是我要争论的。现在来谈谈论证的策略。事实上,对于中枢系统是否是模块化的,几乎没有直接的证据,无论是正反两方面的。毫无疑问,可以将“智力”粗略地分解为“语言”与“数学/空间”能力;毫无疑问,相应的半球专业化的想法是有道理的。但这种二分法非常粗略,本身可能会与输入系统的模块化相混淆——也就是说,除了那些服务于感知和语言分析功能的系统之外,它们几乎没有证据表明存在特定领域的系统(更不用说模块化系统了)。
So much for what I’m going to be arguing for. Now a little about the strategy of the argument. The fact is that there is practically no direct evidence, pro or con, on the question whether central systems are modular. No doubt it is possible to achieve some gross factoring of “intelligence” into “verbal” versus “mathematical/spatial” capacities; and no doubt there is something to the idea of a corresponding hemispheric specialization. But such dichotomies are very gross and may themselves be confounded with the modularity of the input systems—that is to say, they give very little evidence for the existence of domain-specific (to say nothing of modular) systems other than the ones that subserve the functions of perceptual and linguistic analysis.
当你没有直接证据时,你不妨尝试用类比法进行论证,这就是我建议做的。我一直认为,中枢系统的典型功能是通过非论证推理来固定信念(感知或其他)。中枢系统查看输入系统提供的内容,查看记忆中的内容,并使用这些信息来限制关于世界是什么样的“最佳假设”的计算。当然,这些过程在很大程度上是无意识的,人们对它们的运作知之甚少。然而,从我们对非论证推理的明确过程的了解(即从我们对科学中的经验推理的了解),可以推断出一些关于它们的东西,这似乎是合理的。所以,下面是我接下来要讲的内容。首先,我认为科学证实——科学中对信念的非论证固定——通常是非封装的。然后我会争辩说,如果按照这个类比,我们假设中央心理系统也是非封装的,我们就会得到一个关于这些系统的图景,无论如何,考虑到目前关于它们的信息,这个图景并非完全不可信。
When you run out of direct evidence, you might just as well try arguing from analogies, and that is what I propose to do. I have been assuming that the typical function of central systems is the fixation of belief (perceptual or otherwise) by nondemonstrative inference. Central systems look at what the input systems deliver, and they look at what is in memory, and they use this information to constrain the computation of ‘best hypotheses’about what the world is like. These processes are, of course, largely unconscious, and very little is known about their operation. However, it seems reasonable enough that something can be inferred about them from what we know about explicit processes of nondemonstrative inference—viz., from what we know about empirical inference in science. So, here is how I am going to proceed. First, I’ll suggest that scientific confirmation—the nondemonstrative fixation of belief in science—is typically unencapsulated. I’ll then argue that if, pursuing the analogy, we assume that the central psychological systems are also unencapsulated, we get a picture of those systems that is, anyhow, not radically implausible given such information about them as is currently available.
科学中非论证性信念固着具有两个特性,虽然这两个特性已被广泛认可,但据我所知,它们尚未被命名。我将命名它们:科学中的证实是各向同性的,它是奎因式的。众所周知,很难对各向同性和奎因式做出任何近乎严格的解释,但传达直觉却很容易。
The nondemonstrative fixation of belief in science has two properties which, though widely acknowledged, have not (so far as I know) yet been named. I shall name them: confirmation in science is isotropic and it is Quineian. It is notoriously hard to give anything approaching a rigorous account of what being isotropic and Quineian amounts to, but it is easy enough to convey the intuitions.
我说确认是各向同性的,意思是与科学假设确认相关的事实可以从先前确定的经验(或当然是证明)真理的任何地方得出。粗略地说:科学家所知道的一切原则上都与确定他还应该相信什么有关。原则上,我们的植物学限制了我们的天文学,只要我们能想出办法让它们联系起来就行了。
By saying that confirmation is isotropic, I mean that the facts relevant to the confirmation of a scientific hypothesis may be drawn from anywhere in the field of previously established empirical (or, of course, demonstrative) truths. Crudely: everything that the scientist knows is, in principle, relevant to determining what else he ought to believe. In principle, our botany constrains our astronomy, if only we could think of ways to make them connect.
如同方法论研究的惯例一样,我们既可以从规范的角度(我们认为理性归纳实践应该遵循的原则)考虑证实的各向同性,也可以从社会学的角度(工作科学家在评估其理论的证实程度时实际遵守的原则)考虑证实的各向同性。然而,在这两种情况下,我们都不应将证实的各向同性视为仅仅是无端的——或者,用罗蒂(1979)的话来说,仅仅是“可选的”。如果各向同性的证实“部分定义了科学家玩的语言游戏”(还记得我们以前这样说话吗?),那是因为科学家们暗中认同一种深刻的信念——部分是形而上学的,部分是认识论的:世界是一个相互联系的因果系统,我们不知道这些联系是如何安排的。因为我们不知道,所以随着科学理论的变化,我们必须准备放弃先前对证实相关性的估计。所有这些的要点是:确认各向同性是非证明推理具有的合理属性,因为非证明推理的目标是确定因果机制(世界)的真相,而我们对其运作方式一无所知。这就是为什么我们的科学确认制度是各向同性的,也是为什么可以合理地假设心理学家所说的“解决问题”(即为个人信念固定服务的非证明推理)也可能是各向同性的。
As is usual in a methodological inquiry, it is possible to consider the isotropy of confirmation either normatively (as a principle to which we believe that rational inductive practice ought to conform) or sociologically (as a principle which working scientists actually adhere to in assessing the degree of confirmation of their theories). In neither case, however, should we view the isotropy of confirmation as merely gratuitous—or, to use a term of Rorty’s (1979) as merely “optional.” If isotropic confirmation ‘partially defines the language game that scientists play’(remember when we used to talk that way?), that is because of a profound conviction—partly metaphysical and partly epistemological—to which scientists implicitly subscribe: the world is a connected causal system and we don’t know how the connections are arranged. Because we don’t, we must be prepared to abandon previous estimates of confirmational relevance as our scientific theories change. The points of all this is: confirmational isotropy is a reasonable property for nondemonstrative inference to have because the goal of nondemonstrative inference is to determine the truth about a causal mechanism—the world—of whose workings we are arbitrarily ignorant. That is why our institution of scientific confirmation is isotropic, and it is why it is plausible to suppose that what psychologists call “problem-solving” (i.e., nondemonstrative inference in the service of individual fixation of belief) is probably isotropic too.
科学证实的各向同性有时会被否定,但我认为,这种否定从未令人信服。例如,根据一些历史学家的说法,亚里士多德反对伽利略的策略的一部分是声称,除了对天体运动的观察之外,没有其他数据在原则上与地心说的(反)证实有关。因此,对金星相位的望远镜观测被裁定为先验无关。本着类似的精神,一些语言学家最近声称,除了关于母语人士直觉的某些特定事实之外,没有其他数据在原则上与语法理论的(反)证实有关。因此,心理语言学的实验观察被裁定为先验无关。然而,这种方法看起来很像特殊的辩护:当人们珍视的理论因表面上的反证数据而陷入困境时,你往往会得到它。此外,它通常与传统主义者对所捍卫的理论的解释相一致。也就是说,那些声称具有非各向同性确认的理论,甚至其支持者也经常将其视为仅仅是进行预测的机制;他们所宣称的优势是预测充分性,而不是与世界的对应性。(从我们的角度来看,非各向同性确认在某种程度上不是固定信念的程序,因为根据传统主义者的解释,理论的预测充分性并不是相信该理论为真的原因。)
The isotropy of scientific confirmation has sometimes been denied, but never, I think, very convincingly. For example, according to some historians it was part of the Aristotelian strategy against Galileo to claim that no data other than observations of the movements of astronomical objects could, in principle, be relevant to the (dis)confirmation of the geocentric theory. Telescopic observations of the phases of Venus were thus ruled irrelevant a priori. In notably similar spirit, some linguists have recently claimed that no data except certain specified kinds of facts about the intuitions of native speakers could, in principle, be relevant to the (dis)confirmation of grammatical theories. Experimental observations from psycholinguistics are thus ruled irrelevant a priori. However, this sort of methodology seems a lot like special pleading: you tend to get it precisely when cherished theories are in trouble from prima facie disconfirming data. Moreover, it often comports with Conventionalist construals of the theories so defended. That is, theories for which nonisotropic confirmation is claimed are often viewed, even by their proponents, as merely mechanisms for making predictions; what is alleged in their favor is predictive adequacy rather than correspondence to the world. (Viewed from our perspective, nonisotropic confirmation is, to that extent, not a procedure for fixation of belief, since, on the Conventionalist construal, the predictive adequacy of a theory is not a reason for believing that the theory is true.)
关于各向同性问题的最后一点想法。我们对各向同性系统感兴趣,因为此类系统本身就是非封装的。我们对科学证实感兴趣,因为 (a) 有充分的理由假设它是各向同性的;(b) 有充分的理由假设它是一个与信念固着基本相似的过程;(c) 它可能是唯一一个“全局的”、非封装的、整体的认知过程,关于它的任何已知信息都值得报道。尽管如此,如果你想要看到认知各向同性的广义,科学证实可能不是最好的观察点。最好的观察点,至少如果你愿意相信轶事的话,是科学发现。
One final thought on the isotropy issue. We are interested in isotropic systems because such systems are ipso facto unencapsulated. We are interested in scientific confirmation because (a) there is every reason to suppose that it is isotropic; (b) there is every reason to suppose that it is a process fundamentally similar to the fixation of belief; and (c) it is perhaps the only “global”, unencapsulated, wholistic cognitive process about which anything is known that’s worth reporting. For all that, scientific confirmation is probably not the best place to look if you want to see cognitive isotropy writ large. The best place to look, at least if one is willing to trust the anecdotes, is scientific discovery.
关于科学发现的轶事——它们以相当明显的单义性(例如,参见 Ortony,1979 年的论文)——是某种“类比推理”往往起着核心作用。在我看来,我们对此完全一无所知,所以我不打算过分强调这一点。但在科学史上,确实经常出现这样的例子:一个新学科领域的理论结构被借用或至少被其他领域现有的理论所启发:已知的水流知识被用来模拟电流;已知的太阳系结构被用来模拟原子结构;已知的市场行为被用来模拟自然选择过程,而自然选择过程又被用来模拟操作反应的形成。等等。所有这些的要点是,“类比推理”似乎是最纯粹的各向同性:这一过程恰恰取决于之前被认为互不相关的认知领域之间的信息传递。根据定义,封装系统不会进行类比推理。
What the anecdotes say about scientific discovery—and they say it with a considerable show of univocality (see, e.g., papers in Ortony, 1979)—is that some sort of ‘analogical reasoning’often plays a central role. It seems to me that we are thoroughly in the dark here, so I don’t propose to push this point very hard. But it really does look as though there have been frequent examples in the history of science where the structure of theories in a new subject area has been borrowed from, or at least suggested by, theories in situ in some quite different domain: what’s known about the flow of water gets borrowed to model the flow of electricity; what’s known about the structure of the solar system gets borrowed to model the structure of the atom; what’s known about the behavior of the market gets borrowed to model the process of natural selection, which in turn gets borrowed to model the shaping of operant responses. And so forth. The point about all this is that “analogical reasoning” would seem to be isotropy in the purest form: a process which depends precisely upon the transfer of information among cognitive domains previously assumed to be mutually irrelevant. By definition, encapsulated systems do not reason analogically.
在结束之前,我想提出两个教训。第一个教训是,我们越接近我们在理论之前倾向于认为的“更高级”、“更智能”、更少反射性、更少常规性的认知能力练习,就越倾向于出现诸如各向同性之类的全局属性。我怀疑这并非偶然。我怀疑,当我们将认知过程视为典型的智能时,我们脑海中所想的正是它所具有的这些全局属性。第二个教训预示着我将在后面详细讨论的一个观点。令人惊讶的是,虽然每个人都认为类比推理是我们珍视的各种认知成就的重要组成部分,但没有人知道它是如何运作的;甚至没有人知道确认是如何运作的那种模糊的、模糊的、模糊的。我也不认为这是偶然的。事实上,我想提出一个概括;我衷心希望有一天这条定律能被称为“福多尔第一定律:认知科学不存在”。它是这样说的:认知过程越是全球化(例如,越是各向同性),人们就越不能理解它。非常全球化的过程,比如类比推理,根本就不为人所理解。本讨论的最后一部分将详细介绍此类问题。
I want to suggest two morals before I leave this point. The first is that the closer we get to what we are pretheoretically inclined to think of as the ‘higher,’‘more intelligent’, less reflexive, less routine exercises of cognitive capacities, the more such global properties as isotropy tend to show up. I doubt that this is an accident. I suspect that it is precisely its possession of such global properties that we have in mind when we think of a cognitive process as paradigmatically intelligent. The second moral preshadows a point that I shall jump up and down about further on. It is striking that, while everybody thinks that analogical reasoning is an important ingredient in all sorts of cognitive achievements that we prize, nobody knows anything about how it works; not even in the dim, in-a-glass-darkly sort of way in which there are some ideas about how confirmation works. I don’t think that this is an accident either. In fact, I should like to propose a generalization; one which I fondly hope will some day come to be known as ‘Fodor’s First Law of the Nonexistence of Cognitive Science’. It goes like this: the more global (e.g., the more isotropic) a cognitive process is, the less anybody understands it. Very global processes, like analogical reasoning, aren’t understood at all. More about such matters in the last part of this discussion.
我之所以说科学证实是奎因式的,是说赋予任何给定假设的证实程度对整个信仰体系的性质都很敏感;可以说,我们整个科学的形状与每个科学假设的认识论地位有关。请注意,奎因式和各向同性并不是相同的性质,尽管它们密切相关。例如,如果科学证实是各向同性的,那么关于藻类光合作用的一些事实很可能与天体物理学中某些假设的证实有关(“一粒沙中的宇宙”等等)。但奎因式的意义在于,我们可能有两种天体物理学理论,它们都对藻类以及我们能想到的所有其他要测试的东西做出相同的预测,但其中一种理论比另一种理论更能证实——例如,基于简单性、合理性或保守性等考虑。要点是,简单性、合理性和保守性是理论因其与整体科学信念结构的关系而具有的属性。保守性或简单性的度量将是信念系统整体属性的度量。
By saying that scientific confirmation is Quineian, I mean that the degree of confirmation assigned to any given hypothesis is sensitive to properties of the entire belief system; as it were, the shape of our whole science bears on the epistemic status of each scientific hypothesis. Notice that being Quineian and being isotropic are not the same properties, though they are intimately related. For example, if scientific confirmation is isotropic, it is quite possible that some fact about photosynthesis in algae should be relevant to the confirmation of some hypothesis in astrophysics (“the universe in a grain of sand” and all that). But the point about being Quineian is that we might have two astrophysical theories, both of which make the same predictions about algae and about everything else that we can think of to test, but such that one of the theories is better confirmed than the other—e.g., on grounds of such considerations as simplicity, plausibility, or conservatism. The point is that simplicity, plausibility, and conservatism are properties that theories have in virtue of their relation to the whole structure of scientific beliefs taken collectively. A measure of conservatism or simplicity would be a metric over global properties of belief systems.
举一个简单的例子,让我们来看看 Goodman 1954 年对可投射性概念的原始处理。我们知道,两个假设在所有可用数据方面是等价的,但它们的确认程度可能有所不同,这取决于哪个假设更具可投射性。现在,根据 Goodman 的处理,假设的可投射性(至少部分地)继承自其词汇的可投射性,而科学词汇的可投射性则由该项目在先前成功的科学理论中被投射的(加权?)频率决定。因此,根据 Goodman 的说法,过去投射的整个历史都有助于确定任何给定假设的可投射性,而假设的可投射性(部分)决定了其确认程度。同样,如果我们知道如何衡量简单性、保守性等概念,它们也是如此。
Consider, by way of a simple example, Goodman’s original (1954) treatment of the notion of projectability. We know that two hypotheses that are equivalent in respect of all the available data may nevertheless differ in their level of confirmation depending on which is the more projectable. Now, according to Goodman’s treatment, the projectability of a hypothesis is inherited (at least in part) from the projectability of its vocabulary, and the projectability of an item of scientific vocabulary is determined by the (weighted?) frequency with which that item has been projected in previously successful scientific theories. So, the whole history of past projections contributes to determining the projectability of any given hypothesis on Goodman’s account, and the projectability of a hypothesis (partially) determines its level of confirmation. Similarly with such notions as simplicity, conservatism, and the rest if only we knew how to measure them.
科学证实是奎因式的这一观点绝非毫无倾向性。相反,它是传统科学哲学的遗产——“经验主义教条”之一(奎因,1951 年),即每个理论陈述和某些数据陈述之间必须存在语义联系。也就是说,关于“不可观察”的每一个假设都必须包含一些关于可观察的预测,这些蕴涵根据假设所包含的理论术语的含义而成立。4假设这种联系的效果是先验地确定某些数据将推翻某些假设,无论某人的其余科学形态如何。当然,如果H包含O,那么发现 ¬ O将包含 ¬ H 。从这个意义上讲, ¬ O对H的(不)证实与H和O所属的信念系统的整体特征无关。因此,假设数据陈述和理论陈述之间的意义关系就是将确认视为局部现象而不是整体现象。
The idea that scientific confirmation is Quineian is by no means untendentious. On the contrary, it was a legacy of traditional philosophy of science—one of the “dogmas of Empiricism” (Quine, 1951) that there must be semantic connections between each theory statement and some data statements. That is, each hypothesis about “unobservables” must entail some predictions about observables, such entailments holding in virtue of the meanings of the theoretical terms that the hypotheses contain.4 The effect of postulating such connections would be to determine a priori that certain data would disconfirm certain hypotheses, whatever the shape of the rest of one’s science might be. For, of course, if H entails O, the discovery that ¬O would entail that ¬H. To that extent, the (dis)confirmation of H by ¬O is independent of global features of the belief system that H and O belong to. To postulate meaning relations between data statements and theory statements is thus to treat confirmation as a local phenomenon rather than a global one.
我之所以强调这一点,是因为在心理学文献中很容易找到类似的语义学提议。例如,在布鲁纳或维果茨基所倡导的认知理论中(以及最近在“程序性”语义学家的著作中),人们理所当然地认为“概念”和“感知”之间必须存在意义联系。基本上,根据这些理论,概念是将刺激物分类的处方。每个处方都指定了一组(或多或少确定的)测试,人们可以执行这些测试来实现分类,并且每个刺激物类别都与一组(或多或少确定的)测试结果相关联。粗略地但接近目前的目的,有一条规则,你可以通过找出某物是否吠叫来测试它是否是狗,并且声称这条规则是概念“狗”的组成部分(当然,不是详尽无遗的)。由于它是否吠叫与它是否是狗有关,这被认为是一个概念性真理,因此“某物是狗”和“它吠叫”之间的确认关系对一个人的信念系统的整体属性不敏感。因此,理论简单性等考虑甚至在原则上都不能得出它是否吠叫与它是否是狗无关的结论。接受这一结论就是改变概念。
I emphasize this consideration because analogous semantic proposals can readily be found in the psychological literature. For example, in the sorts of cognitive theories espoused by, say, Bruner or Vygotsky (and, more recently, in the work of the “procedural” semanticists), it is taken for granted that there must be connections of meaning between ‘concepts’and ‘percepts’. Basically, according to such theories, concepts are recipes for sorting stimuli into categories. Each recipe specifies a (more or less determinate) galaxy of tests that one can perform to effect a sorting, and each stimulus category is identified with a (more or less determinate) set of outcomes of the tests. To put the idea crudely but near enough for present purposes, there’s a rule that you can test for dog by finding out if a thing barks, and the claim is that this rule is constitutive (though not, of course, exhaustive) of the concept dog. Since it is alleged to be a conceptual truth that whether it barks is relevant to whether it’s a dog, it follows that the confirmation relation between “a thing is a dog” and “it barks” is insensitive to global properties of one’s belief system. So considerations of theoretical simplicity etc. could not, even in principle, lead to the conclusion that whether it barks is irrelevant to whether it’s a dog. To embrace that conclusion would be to change the concept.
这种例子清楚地表明了奎因主义和各向同性之间的密切关系。因为根据刚才提出的观点,吠叫与狗性相关是一个意义问题,所以不可能从经验的角度发现一个人对这种相关性关系的看法是错误的。但各向同性是任何事实都可能与任何其他事实的确认(不)相关这一原则。因此,布鲁纳-维果茨基程序语义线与确认的各向同性以及其奎因性不相容。
This sort of example makes it clear how closely related being Quineian and being isotropic are. Since, on the view just scouted, it is a matter of meaning that barking is relevant to dogness, it is not possible to discover on empirical grounds that one was wrong about that relevancy relation. But isotropy is the principle that any fact may turn out to be (ir)relevant to the confirmation of any other. The Bruner-Vygotsky-procedural semantics line is thus incompatible with the isotropy of confirmation as well as with its Quineianness.
在说确认是各向同性的和奎因式的时,我有意与科学哲学和认知心理学的主要传统相左。尽管如此,我还是理所当然地认为科学确认是奎因式的和各向同性的。(那些希望看到这些论点的人应该参考现代科学哲学中的经典论文,如奎因,1951 年和普特南,1962 年。)此外,由于我致力于依靠科学确认和心理信念固着之间的类比,我理所当然地认为后者也一定是奎因式的和各向同性的,因此认知心理学中的布鲁纳-维果茨基程序语义学传统一定是错误的。在这一点上,我建议既明确又强调。论点是,调解信念固着的中心过程通常是理性非论证推理的过程,并且由于理性非论证推理的过程是奎因式的和各向同性的,因此中心过程也是如此。具体来说,此类过程的理论必须与这样的原则相一致:任何信念的接受程度都对任何其他信念的接受程度以及总体信念领域的整体特性很敏感。
In saying that confirmation is isotropic and Quineian, I am thus consciously disagreeing with major traditions in the philosophy of science and in cognitive psychology. Nevertheless, I shall take it for granted that scientific confirmation is Quineian and isotropic. (Those who wish to see the arguments should refer to such classic papers in the modern philosophy of science as Quine, 1951, and Putnam, 1962.) Moreover, since I am committed to relying upon the analogy between scientific confirmation and psychological fixation of belief, I shall take it for granted that the latter must be Quineian and isotropic too, hence that the Bruner-Vygotsky-procedural semantics tradition in cognitive psychology must be mistaken. I propose, at this point, to be both explicit and emphatic. The argument is that the central processes which mediate the fixation of belief are typically processes of rational nondemonstrative inference and that, since processes of rational nondemonstrative inference are Quineian and isotropic, so too are central processes. In particular, the theory of such processes must be consonant with the principle that the level of acceptance of any belief is sensitive to the level of acceptance of any other and to global properties of the field of beliefs taken collectively.
基于这些假设,我现在有两件事要做:我需要证明这个中心过程的图像与它们是模块化的假设大体上不相容,并且我需要证明这个图像具有一定的合理性,而不依赖于认知心理学和科学哲学之间的假定类比。
Given these assumptions, I have now got two things to do: I need to show that this picture of the central processes is broadly incompatible with the assumption that they are modular, and I need to show that it is a picture that has some plausibility independent of the putative analogy between cognitive psychology and the philosophy of science.
我认为第一个主张相对来说没有争议。我们认为模块化从根本上讲是信息封装的问题,当然,信息封装恰恰不是奎因/各向同性系统所具有的。当我们讨论输入系统时,我们将它们视为投射和确认假设的机制。我们指出,从这个角度来看,此类系统的信息封装相当于对它们所采用的确认指标的限制;封装系统的确认指标只能“查看”某一受限类别的数据来确定接受哪个假设。特别是,如果信息通过此类系统流动的方式实际上是从下到上,那么其信息封装就在于,第 i级假设仅通过参考低于第 i级的表示来确认(不确认)。即使模块内的数据流不受约束,封装也意味着对模块内过程访问模块外信息源的限制。而相反,各向同性的定义是系统在确定假设的确认水平时可以查看它所知道的任何事物的属性。因此,一般来说,确认度量越各向同性,它接受的与约束其决策相关的数据来源就越异质。在这方面,科学确认在极限上是各向同性的;它提供了一个非模块化信念固定的模型。
I take it that the first of these claims is relatively uncontroversial. We argued that modularity is fundamentally a matter of informational encapsulation and, of course, informationally encapsulated is precisely what Quineian/isotropic systems are not. When we discussed input systems, we thought of them as mechanisms for projecting and confirming hypotheses. And we remarked that, viewed that way, the informational encapsulation of such systems is tantamount to a constraint on the confirmation metrics that they employ; the confirmation metric of an encapsulated system is allowed to ‘look at’only a certain restricted class of data in determining which hypothesis to accept. If, in particular, the flow of information through such a system is literally bottom-to-top, then its informational encapsulation consists in the fact that the ith-level hypotheses are (dis)confirmed solely by reference to lower-than-ith level representations. And even if the flow of data is unconstrained within a module, encapsulation implies constraints upon the access of intramodular processes to extramodular information sources. Whereas, by contrast, isotropy is by definition the property that a system has when it can look at anything it knows about in the course of determining the confirmation levels of hypotheses. So, in general, the more isotropic a confirmation metric is, the more heterogeneous the provenance of the data that it accepts as relevant to constraining its decisions. Scientific confirmation is isotropic in the limit in this respect; it provides a model of what the nonmodular fixation of belief is like.
同样,奎因式也适用。奎因式确认指标本身对信念系统的整体属性敏感。现在,严格来说,一个信息封装的系统仍然可以是奎因式的。例如,即使在计算某个任意选择的信念子集的简单性分数的系统中,简单性也可能限制确认。但这只是对字面意思的吹毛求疵。从精神上讲,评估假设的全局标准与证据相关性的各向同性原则最自然地相符。事实上,只有在假设证据的选择是各向同性的前提下,对简单性的考虑(以及假设的其他此类全局属性)才是信念的合理决定因素。从认识论上讲,H & T是一种比H & T更简单的理论,其中H是需要评估的假设,T是人们所相信的其余部分。但对T是某个任意划定的信念子集的类似考虑并不感兴趣。当相关性具有非各向同性时,相对简单性的评估可能会被篡改以支持任何假设。这就是为什么输入系统(通过假设信息封装)的运作不应与知觉信念的固着相提并论的原因之一;至少,那些希望将知觉信念的固着视为大体上理性过程的人不应这样认为。
Similarly with being Quineian. Quineian confirmation metrics are ipso facto sensitive to global properties of belief systems. Now, an informationally encapsulated system could, strictly speaking, nevertheless be Quineian. Simplicity, for example, could constrain confirmation even in a system which computes its simplicity scores over some arbitrarily selected subset of beliefs. But this is mere niggling about the letter. In spirit, global criteria for the evaluation of hypotheses comport most naturally with isotropic principles for the relevance of evidence. Indeed, it is only on the assumption that the selection of evidence is isotropic that considerations of simplicity (and other such global properties of hypotheses) are rational determinants of belief. It is epistemically interesting that H & T is a simpler theory than ¬H & T where H is a hypothesis to be evaluated and T is the rest of what one believes. But there is no interest in the analogous consideration where T is some arbitrarily delimited subset of one’s beliefs. Where relevance is non-isotropic, assessments of relative simplicity can be gerrymandered to favor any hypothesis one likes. This is one of the reasons why the operation of (by assumption informationally encapsulated) input systems should not be identified with the fixation of perceptual belief; not, at least, by those who wish to view the fixation of perceptual belief as by and large a rational process.
因此,各向同性/奎因系统显然是非封装的;如果非封装,那么大概就是非模块化的。或者更确切地说,因为这完全是一个程度问题,我们最好说,如果一个系统是奎因和各向同性的,那么它也是非模块化的。简而言之,如果各向同性和奎因的考虑在确定中央系统执行的计算过程中特别重要,那么这些系统的计算特性应该与垂直能力不同。
So it seems clear that isotropic/Quineian systems are ipso facto unencapsulated; and if unencapsulated, then presumably nonmodular. Or rather, since this is all a matter of degree, we had best say that to the extent that a system is Quineian and isotropic, it is also nonmodular. If, in short, isotropic and Quineian considerations are especially pressing in determining the course of the computations that central systems perform, it should follow that these systems differ in their computational character from the vertical faculties.
我们即将找到我们最初想要找到的东西:认知系统的整体分类。根据目前的提议,至少有两个这样的系统类别:模块(相对而言,是领域特定和封装的)和中心过程(相对而言,是领域中立和各向同性/奎因的)。我们认为模块化认知系统的特征功能是输入分析,而中心过程的特征功能是信念的固定。如果这是正确的,那么我们就有三种分类认知过程的方法,它们被证明是同延的:
We are coming close to what we started out to find: an overall taxonomy of cognitive systems. According to the present proposal, there are, at a minimum, two families of such systems: modules (which are, relatively, domain specific and encapsulated) and central processes (which are, relatively, domain neutral and isotropic/Quineian). We have suggested that the characteristic function of modular cognitive systems is input analysis and that the characteristic function of central processes is the fixation of belief. If this is right, then we have three ways of taxonomizing cognitive processes which prove to be coextensive:
功能分类:输入分析与信念固着
Functional taxonomy: input analysis versus fixation of belief
按主题分类:领域特定与领域中立
Taxonomy by subject matter: domain specific versus domain neutral
按计算特性分类:封装与奎因/各向同性
Taxonomy by computational character: encapsulated versus Quineian/isotropic
我再说一遍,如果这种共延成立的话,它只是偶然成立的。从逻辑上讲,没有什么可以阻止人们想象这些类别交叉分类认知系统。如果它们不交叉,那么这就是关于心智结构的事实。事实上,这是关于心智结构的深刻事实。
I repeat that this coextension, if it holds at all, holds contingently. Nothing in point of logic stops one from imagining that these categories cross-classify the cognitive systems. If they do not, then that is a fact about the structure of the mind. Indeed, it is a deep fact about the structure of the mind.
如果有更好的证据支持我所提出的中心过程观点,那么所有这些都会更加令人印象深刻。到目前为止,这种解释完全基于信念固着的心理过程与关于科学证实特征的某个故事之间的类比。鉴于目前心理学的思维和解决问题理论还不发达,我对此无能为力。然而,无论如何,我想提出两个似乎相关且有希望的考虑因素。
All of which would be considerably more impressive if there were better evidence for the view of central processes that I have been proposing. Thus far, that account rests entirely on the analogy between psychological processes of belief fixation and a certain story about the character of scientific confirmation. There is very little that I can do about this, given the current underdeveloped state of psychological theories of thought and problem-solving. For what it’s worth, however, I want to suggest two considerations that seem relevant and promising.
首先,当我们试图构建中心过程理论时,我们遇到的困难正是我们预料到的困难,如果这些过程在本质上是奎因式的/各向同性的,而不是封装的。构建此类理论的关键在于,似乎没有办法界定可能影响或受问题解决中心过程影响的信息资源类型。也就是说,我们无法合理地将信念的固着视为受有界局部信息结构计算的影响。这种困难的一个生动例子出现在人工智能中,它被称为“框架问题”(即,将信念集置于“框架”中的问题,这些信念集可能需要根据特定的新获得的信息进行修改。参见 McCarthy 和 Hayes (1969) 的讨论,以下例子即出自该讨论)。
The first is that the difficulties we encounter when we try to construct theories of central processes are just the sort we would expect to encounter if such processes are, in essential respects, Quineian/isotropic rather than encapsulated. The crux in the construction of such theories is that there seems to be no way to delimit the sorts of informational resources which may affect, or be affected by, central processes of problem-solving. We can’t, that is to say, plausibly view the fixation of belief as effected by computations over bounded, local information structures. A graphic example of this sort of difficulty arises in AI, where it has come to be known as the “frame problem” (i.e., the problem of putting a “frame” around the set of beliefs that may need to be revised in light of specified newly available information. Cf. the discussion in McCarthy and Hayes (1969), from which the following example is drawn).
要了解发生了什么,假设你有兴趣构建一个能够在熟悉的人类环境中处理日常任务的机器人。具体来说,机器人的任务是给玛丽打电话,了解她是否会迟到吃晚饭。我们假设机器人“知道”它可以通过查阅目录获得玛丽的号码。因此,它查找玛丽的号码并继续拨号。到目前为止,一切顺利。但现在,请注意,开始拨号会对世界状态(当然包括机器人的内部状态)产生各种直接和间接的影响,其中一些影响是设备需要记住的,以指导其未来的行动和预期。例如,当拨号开始时,电话不再可自由拨打外部电话;机器人的手指(或其他东西)会进行适当的空间位置改变;拨号音会中断并被哔哔声取代;Murray Hill 的计算机中会发生一些事情;等等。有些(但原则上不是全部)后果是机器人必须设计来监控的,因为它们与“更新”信念有关,机器人最终可能会根据这些信念采取行动。那么,哪些后果呢?这个问题至少有以下几个组成部分。机器人必须能够以合理的准确度识别出其先前信念中那些可能由于其当前活动而改变真值的那些信念;并且它必须能够访问执行与实现这些改变相关的任何计算的系统。
To see what’s going on, suppose you were interested in constructing a robot capable of coping with routine tasks in familiar human environments. In particular, the robot is presented with the job of phoning Mary and finding out whether she will be late for dinner. Let’s assume that the robot ‘knows’ it can get Mary’s number by consulting the directory. So it looks up Mary’s number and proceeds to dial. So far, so good. But now, notice that commencing to dial has all sorts of direct and indirect effects on the state of the world (including, of course, the internal state of the robot), and some of these effects are ones that the device needs to keep in mind for the guidance of its future actions and expectations. For example, when the dialing commences, the phone ceases to be free to outside calls; the robot’s fingers (or whatever) undergo appropriate alterations of spatial location; the dial tone cuts off and gets replaced by beeps; something happens in a computer at Murray Hill; and so forth. Some (but, in principle, not all) such consequences are ones that the robot must be designed to monitor since they are relevant to “updating” beliefs upon which it may eventually come to act. Well, which consequences? The problem has at least the following components. The robot must be able to identify, with reasonable accuracy, those of its previous beliefs whose truth values may be expected to alter as a result of its current activities; and it must have access to systems that do whatever computing is involved in effecting the alterations.
请注意,除非这些电路布置正确,否则事情可能会变得非常错误。假设机器人查阅了目录后,确定玛丽的号码是 222-2222,根据先前收到的指令,它将开始拨打这个号码。但是现在机器想到,由于它已开始拨号,可能需要更新的信念之一是它(最近获得的)关于玛丽电话号码的信念。因此,它当然会停止拨号,然后(再次)查找玛丽的电话号码。重复上述操作,直到您觉得有趣为止。显然,我们在这里遇到了计算陷阱的所有要素。除非机器人能够确信它的某些信念在某些动作下保持不变,否则它将永远无法做任何事情。
Notice that, unless these circuits are arranged correctly, things can go absurdly wrong. Suppose that, having consulted the directory, the robot has determined that Mary’s number is 222-2222, which number it commences to dial, pursuant to instructions previously received. But now it occurs to the machine that one of the beliefs that may need updating in consequence of its having commenced dialing is its (recently acquired) belief about Mary’s telephone number. So, of course, it stops dialing and goes and looks up Mary’s telephone number (again). Repeat, da capo, as many times as may amuse you. Clearly, we have here all the makings of a computational trap. Unless the robot can be assured that some of its beliefs are invariant under some of its actions, it will never get to do anything.
那么,如果机器人已经开始了某种行动,机器程序如何确定它应该重新评估哪些信念呢?这个问题之所以如此困难,恰恰是因为任何局部解决方案似乎都不太可能完成这项工作。例如,以下事实似乎是不言而喻的:首先,没有固定的信念集,对于任何行动,只有那些信念才是需要重新考虑的信念。(也就是说,哪些信念可供考虑,与执行哪些行动以及执行的背景密切相关。有一些行动——实际上是无限多的行动——如果执行,应该让人考虑玛丽的电话号码因此而改变的可能性。)其次,新信念不会附带有关它们应该影响哪些旧信念的信息。相反,我们永远对我们所知道的东西的含义感到惊讶,当然包括我们对所执行行动的了解。第三,适合重新考虑的信念集不能根据其获得时间的近期性、普遍性或信念内容与执行动作的描述之间的语义关系来确定……等等。如果这些命题中的任何一个看起来都不是显而易见的,请考虑框架问题的特殊情况,其中机器人是机械科学家,执行的动作是实验。在这里,“考虑到我的行为可能产生的后果,我应该重新考虑我的哪些信念”这个问题显然等同于“一般来说,我的信念与我的经验的最佳调整是什么?”当然,这正是确认理论应该回答的问题;而且,正如我们费尽心思注意到的那样,确认不是可以通过参考假设或与假设相关的数据的局部属性来重建的关系。
How, then, does the machine’s program determine which beliefs the robot ought to reevaluate given that it has embarked upon some or other course of action? What makes this problem so hard is precisely that it seems unlikely that any local solution will do the job. For example, the following truths appear to be self-evident: First, that there is no fixed set of beliefs such that, for any action, those and only those beliefs are the ones that require reconsideration. (That is, which beliefs are up for grabs depends intimately upon which actions are performed and upon the context of the performances. There are some—indeed, indefinitely many—actions which, if performed, should lead one to consider the possibility that Mary’s telephone number has changed in consequence.) Second, new beliefs don’t come docketed with information about which old beliefs they ought to affect. On the contrary, we are forever being surprised by the implications of what we know, including, of course, what we know about the actions we perform. Third, the set of beliefs apt for reconsideration cannot be determined by reference to the recency of their acquisition, or by reference to their generality, or by reference to merely semantic relations between the contents of the beliefs and the description under which the action is performed…etc. Should any of these propositions seem less than self-evident, consider the special case of the frame problem where the robot is a mechanical scientist and the action performed is an experiment. Here the question ‘which of my beliefs ought I to reconsider given the possible consequences of my action’is transparently equivalent to the question “What, in general, is the optimal adjustment of my beliefs to my experiences?” This is, of course, exactly the question that a theory of confirmation is supposed to answer; and, as we have been at pains to notice, confirmation is not a relation reconstructible by reference to local properties of hypotheses or of the data that bear upon them.
我的意思是,一旦我们开始研究除输入分析之外的认知过程——特别是非证明性信念固着的核心过程——我们就会遇到具有相当典型性质的问题。它们似乎涉及各向同性和奎因式计算;这些计算在某个方面对整个信念系统都很敏感。这正是人们所期望的,只要我们假设非证明性信念固着确实非常像科学证实,而科学证实本身具有奎因式和各向同性的特征。在这方面,我认为框架问题是一个典型问题,在这方面,框架问题的严重性尚未得到充分重视。
I am suggesting that, as soon as we begin to look at cognitive processes other than input analysis—in particular, at central processes of nondemonstrative fixation of belief—we run into problems that have a quite characteristic property. They seem to involve isotropic and Quineian computations; computations that are, in one or other respect, sensitive to the whole belief system. This is exactly what one would expect on the assumption that nondemonstrative fixation of belief really is quite like scientific confirmation, and that scientific confirmation is itself characteristically Quineian and isotropic. In this respect, it seems to me, the frame problem is paradigmatic, and in this respect the seriousness of the frame problem has not been adequately appreciated.
例如,拉斐尔(1971)评论道:“(智能机器人)必须能够执行任务。由于任务通常涉及世界的一些变化,因此它必须能够更新其模型(世界模型),以便在执行任务期间和之后保持与之前一样准确。此外,它必须能够规划如何执行任务,而这一规划过程通常需要同时‘记住’各种可能的操作以及由这些操作产生的假设世界的相应模型。与跟踪这些假设世界有关的簿记问题是框架问题的大部分困难所在”(第 159 页)。这使得问题看起来主要是(a)如何记录可能世界和(b)如何跟踪改变状态描述的证明性后果。但更深层次的问题肯定是跟踪非证明性后果。更准确地说,问题是,给定一个任意的信念世界 W 和一个新的状态描述“a 是 F”,合适的后继信念世界 W' 是什么?假设设备过去相信 W,现在相信 a 是 F,它应该相信什么?但这不仅仅是一个簿记问题;它是归纳确认的一般问题。5
For example, Raphael (1971) comments as follows: “(An intelligent robot) will have to be able to carry out tasks. Since a task generally involves some change in the world, it must be able to update its model (of the world) so it remains as accurate during and after the performance of a task as it was before. Moreover, it must be able to plan how to carry out a task, and this planning process usually requires keeping ‘in mind’simultaneously a variety of possible actions and corresponding models of hypothetical worlds that would result from those actions. The bookkeeping problems involved with keeping track of these hypothetical worlds account for much of the difficulty of the frame problem” (p. 159). This makes it look as though the problem is primarily (a) how to notate the possible worlds and (b) how to keep track of the demonstrative consequences of changing state descriptions. But the deeper problem, surely, is to keep track of the nondemonstrative consequences. Slightly more precisely, the problem is, given an arbitrary belief world W and a new state description ‘a is F’, what is the appropriate successor belief world W’? What ought the device to believe, given that it used to believe W and now believes that a is F? But this isn’t just a bookkeeping problem; it is the general problem of inductive confirmation.5
据我所知,关于人工智能框架问题的通常假设是,它在某种程度上可以“启发式”地解决。这个想法是,虽然非演示性确认(因此,大概也是信念固着的心理学)在原则上是各向同性的和奎因式的,但给定一个特定的假设,在实践中,存在启发式程序来确定其接受对其余信念的影响范围。由于这些程序根据假设仅仅是启发式的,因此可以假设它们是局部的——即,对它们适用的信念系统的整体不太敏感。类似的事情可能确实如此;确实有大量证据表明信念固着存在启发式捷径,这些证据既来自问题解决心理学的研究(有关最近的评论,请参阅 Nisbett 和 Ross,1980 年),也来自科学社会学(Kuhn,1970 年)。在这种情况下,可以表明潜在的相关考虑因素经常被系统地忽略、扭曲或曲解,以支持相对局部(当然,非常容易出错)的问题解决策略。也许一组这样的启发式方法,经过适当协调和快速部署,就足以使机器人的核心过程像你的、我的或执业科学家的一样具有奎因性和各向同性。由于目前还没有关于哪些启发式方法可能属于这样的一组的严肃提议,因此似乎不值得争论这一点。
So far as I can tell, the usual assumption about the frame problem in AI is that it is somehow to be solved ‘heuristically’. The idea is that, while nondemonstrative confirmation (and hence, presumably, the psychology of belief fixation) is isotropic and Quineian in principle, still, given a particular hypothesis, there are, in practice, heuristic procedures for determining the range of effects its acceptance can have on the rest of one’s beliefs. Since these procedures are by assumption merely heuristic, they may be assumed to be local—i.e., to be sensitive to less than the whole of the belief systems to which they apply. Something like this may indeed be true; there is certainly considerable evidence for heuristic short-cutting in belief fixation, deriving both from studies of the psychology of problem-solving (for a recent review, see Nisbett and Ross, 1980) and from the sociology of science (Kuhn, 1970). In such cases, it is possible to show how potentially relevant considerations are often systematically ignored, or distorted, or misconstrued in favor of relatively local (and, of course, highly fallible) problem-solving strategies. Perhaps a bundle of such heuristics, properly coordinated and rapidly deployed, would suffice to make the central processes of a robot as Quineian and isotropic as yours, or mine, or the practicing scientist’s ever actually succeed in being. Since there are, at present, no serious proposals about what heuristics might belong to such a bundle, it seems hardly worth arguing the point.
尽管如此,我还是要对此稍微争论一下。
Still, I am going to argue it a little.
有些人认为,人工智能领域最近发展起来的理念——例如“框架”(见 Minsky,1975 年)6或“脚本”(见 Schank 和 Abelson,1975 年)等概念将阐明信念固着的全局性问题,因为它们在某种意义上确实为在遇到特定问题时调用的信息主体设置了一个框架。(有关这些乐观观点的讨论,请参阅 Thagard。)然而,在我看来,这里出现的进步完全是虚幻的——这是混淆符号和理论的典型例子。
There are those who hold that ideas recently evolved in AI—such notion as, e.g., those of ‘frame’(see Minsky, 1975)6 or ‘script’ see Schank and Abelson, 1975—will illuminate the problems about the globality of belief fixation since they do, in a certain sense, provide for placing a frame around the body of information that gets called when a given sort of problem is encountered. (For a discussion that runs along these optimistic lines, see Thagard.) It seems to me, however, that the appearance of progress here is entirely illusory—a prime case of confusing a notation with a theory.
如果框架问题有一个原则性的解决方案,那么毫无疑问,该解决方案可以表达为对脚本或框架的约束,给定的归纳过程可以访问这些脚本或框架。但是,如果没有这样的解决方案,那么只有问题引发的框架(/脚本)中表示的信息才可用于计算解决问题的想法根本就没有内容。首先,由于对框架(/脚本)的个体化没有任何限制,因此程序员可以自行决定将任何两条信息归属于同一框架(/脚本)。这只是一种说法,即无论解决方案是什么,框架问题的解都可以适应框架(/脚本)符号。这只是另一种说法,即符号不限制解决方案。其次,框架(/脚本)的一个广为人知的属性是它们可以相互引用。苏格拉底的框架说,除其他外,“见柏拉图” ……等等。毫无疑问,在任何已开发的模型中,交叉引用系统都意味着一张图,其中从每个点到任何其他点都有一条路径(长度或长或短)。但现在我们又一次遇到了框架问题,其形式是:在给定的问题解决情况下,应该实际遍历哪些这样的路径,以及什么应该限制行程的长度?所发生的一切是,我们现在被邀请将框架问题视为执行控制理论中的问题,而不是将其视为确认逻辑中的问题(顺便说一句,没有理由认为这一变化是好的)。稍后将对此进行更多讨论。
If there were a principled solution to the frame problem, then no doubt that solution could be expressed as a constraint on the scripts, or frames, to which a given process of induction has access. But, lacking such a solution, there is simply no content to the idea that only the information represented in the frame (/script) that a problem elicits is computationally available for solving the problem. For one thing, since there are precisely no constraints on the individuation of frames (/scripts), any two pieces of information can belong to the same frame (/script) at the discretion of the programmer. This is just a way of saying that the solution of the frame problem can be accommodated to the frame (/script) notation whatever that solution turns out to be. Which is just another way of saying that the notation does not constrain the solution. Second, it is a widely advertised property of frames (/scripts) that they can cross-reference to one another. The frame for Socrates says, among other things, ‘see Plato’…and so forth. There is no reason to doubt that, in any developed model, the system of cross-referencing would imply a graph in which there is a route (of greater or lesser length) from each point to any other. But now we have the frame problem all over again, in the form: Which such paths should actually be traversed in a given case of problem-solving, and what should bound the length of the trip? All that has happened is that, instead of thinking of the frame problem as an issue in the logic of confirmation, we are now invited to think of it as an issue in the theory of executive control (a change which there is, by the way, no reason to assume is for the better). More of this presently.
现在,让我们总结一下主要的论点。如果我们假设中心过程是奎因式的和各向同性的,那么我们应该预测,当我们试图构建模拟此类过程或以其他方式解释它们的心理学理论时,某些类型的问题将会出现;具体来说,我们应该预测涉及非局部计算机制特征的问题。相比之下,这些问题对心理模块理论来说并不重要。由于根据假设,模块系统是信息封装的,因此它们执行的计算是相对局部的。在我看来,这些预测与认知科学问题实际上已经成熟的方式相当吻合:输入系统似乎主要是由刺激驱动的,因此利用了对有机体信念系统的一般结构相对不敏感的计算过程。然而,当我们转向信念的固着时,我们得到了一系列问题,这些问题似乎很难解决,正是因为它们涉及非局部的心理过程。其中,正如我们所见,框架问题是一个缩影。
For now, let’s summarize the major line of argument. If we assume that central processes are Quineian and isotropic, then we ought to predict that certain kinds of problems will emerge when we try to construct psychological theories which simulate such processes or otherwise explain them; specifically, we should predict problems that involve the characterization of nonlocal computational mechanisms. By contrast, such problems should not loom large for theories of psychological modules. Since, by assumption, modular systems are informationally encapsulated, it follows that the computations they perform are relatively local. It seems to me that these predictions are in reasonably good accord with the way that the problems of cognitive science have in fact matured: the input systems appear to be primarily stimulus driven, hence to exploit computational processes that are relatively insensitive to the general structure of the organism’s belief system. Whereas, when we turn to the fixation of belief, we get a complex of problems that appear to be intractable precisely because they concern mental processes that aren’t local. Of these, the frame problem is, as we have seen, a microcosm.
我一直在整理各种观点,以支持中枢过程是奎因式/各向同性的观点。科学证实的类比表明它们应该是这样的,在试图对中枢过程进行建模时出现的问题的结构与这种观点非常兼容。我现在补充说,将中枢过程视为计算全局的观点可能在某种程度上具有神经学上的合理性。它所显示的大脑图像与我们实际拥有的大脑类型相当接近。
I have been marshaling considerations in favor of the view that central processes are Quineian/isotropic. That is what the analogy to scientific confirmation suggests that they ought to be, and the structure of the problems that arise in attempts to model central processes is quite compatible with that view of them. I now add that the view of central processes as computationally global can perhaps claim some degree of neurological plausibility. The picture of the brain that it suggests is a reasonably decent first approximation to the kind of brain that it appears we actually have.
当我们讨论输入分析器时,我评论了信息封装和固定神经结构之间的自然联系。粗略地说,对信息流的长期限制意味着硬连线的选择。如果在极端情况下,系统 B 需要记录来自系统 A 的信息,而不允许记录来自其他地方的信息,那么你不妨在大脑中建立从 A 到 B 的永久神经解剖连接。简而言之,可以合理地预期,信息向心理过程分配的偏差会表现为神经结构中的结构偏差。
When we discussed input analyzers, I commented on the natural connection between informational encapsulation and fixed neural architecture. Roughly, standing restrictions on information flow imply the option of hardwiring. If, in the extreme case, system B is required to take note of information from system A and is allowed to take note of information from nowhere else, you might as well build your brain with a permanent neuroanatomical connection from A to B. It is, in short, reasonable to expect biases in the distribution of information to mental processes to show up as structural biases in neural architecture.
相比之下,考虑一下奎因/各向同性系统,其中任何子系统都可能随时想要与任何其他子系统进行通信。在这种情况下,你会认为相应的神经解剖结构相对分散。在极限情况下,你可能有一个随机网络,每个计算子系统都直接或间接地与其他子系统相连;这是一种连接方式,在这种连接方式中,神经解剖形式和心理功能之间可以实现最低限度的稳定对应。关键在于,在奎因/各向同性系统中,重要的可能是不稳定的瞬时连接。与硬接线不同,你获得的连接会根据正在执行的程序与手头任务结构之间的交互而随时变化。寓意似乎是,计算各向同性与神经各向同性(与拉什利所说的神经结构的“等势性”)自然相符,就像信息封装与神经硬接线的精心设计自然相符一样。
Consider, by contrast, Quineian/isotropic systems, where more or less any subsystem may want to talk to any other at more or less any time. In this case, you’d expect the corresponding neuroanatomy to be relatively diffuse. At the limit, you might as well have a random net, with each computational subsystem connected, directly or indirectly, with every other; a kind of wiring in which you get a minimum of stable correspondence between neuroanatomical form and psychological function. The point is that in Quineian/isotropic systems, it may be unstable, instantaneous connectivity that counts. Instead of hardwiring, you get a connectivity that changes from moment to moment as dictated by the interaction between the program that is being executed and the structure of the task in hand. The moral would seem to be that computational isotropy comports naturally with neural isotropy (with what Lashley called “equipotentiality” of neural structure) in much the same way that informational encapsulation comports naturally with the elaboration of neural hardwiring.
因此,如果输入分析是模块化的,而思维是奎因式/各向同性的,那么你可能会期望大脑中存在与感知和语言相关的稳定神经结构,但与思维无关。而且,我认为这似乎与我们实际上发现的差不多。正如我上面所说,关于感知和语言机制的神经特异性可以说很多:在最坏的情况下,我们可以详细列举处理它们的大脑部分;在最好的情况下,我们可以在执行这些功能的区域展示特征性神经结构。然后是其他更高级的大脑系统(参见过去称为“联想皮层”),其中神经连接似乎无处不在,形式/功能对应关系似乎微乎其微。这一切都具有一些历史讽刺意味。加尔从(垂直)能力心理学论证了大脑的宏观分化。他的对手弗洛伦斯从笛卡尔自我的统一性论证了大脑的等势性(见拜纳姆,1976 年)。目前的观点是,他们都是正确的。7
So, if input analysis is modular and thought is Quineian/isotropic, you might expect a kind of brain in which there is stable neural architecture associated with perception-and-language but not with thought. And, I suggest, this seems to be pretty much what we in fact find. There is, as I remarked above, quite a lot that can be said about the neural specificity of the perceptual and linguistic mechanisms: at worst we can enumerate in some detail the parts of the brain that handle them; and at best we can exhibit characteristic neural architecture in the areas where these functions are performed. And then there are the rest of the higher brain systems (cf. what used to be called “association cortex”), in which neural connectivity appears to go every which way and the form/function correspondence appears to be minimal. There is some historical irony in all this. Gall argued from a (vertical) faculty psychology to the macroscopic differentiation of the brain. Flourens, his archantagonist, argued from the unity of the Cartesian ego to the brain’s equipotentiality (see Bynum, 1976). The present suggestion is that they were both right.7
天知道,我并不是要成为神经心理学专家,我非常清楚这一切都是印象派的。但在我们收集印象的同时,我认为下面的印象非常引人注目。《科学美国人》(1979 年 9 月)最近一期专门讨论大脑。它的目录和它包含的论文一样有趣。正如您所预料的那样,其中有文章涵盖了语言和感知机制的神经心理学。但是没有关于思维神经心理学的内容——大概是因为人们对思维神经心理学一无所知。我认为人们对它一无所知是有充分理由的——也就是说,没有什么可了解的。对于模块化过程(特别是输入系统),您可以获得形式/功能对应关系;但是,对于中央过程,您可以获得通用连接的近似值,因此没有稳定的神经结构可供撰写《科学美国人》文章。
I am, heaven knows, not about to set up as an expert on neuropsychology, and I am painfully aware how impressionistic this all is. But while we’re collecting impressions, I think the following one is striking. A recent issue of Scientific American (September, 1979) was devoted to the brain. Its table of contents is quite as interesting as the papers it contains. There are, as you might expect, articles that cover the neuropsychology of language and of the perceptual mechanisms. But there is nothing on the neuropsychology of thought—presumably because nothing is known about the neuropsychology of thought. I am suggesting that there is a good reason why nothing is known about it—namely, that there is nothing to know about it. You get form/function correspondence for the modular processes (specifically, for the input systems); but, in the case of central processes, you get an approximation to universal connectivity, hence no stable neural architecture to write Scientific American articles about.
简而言之,没有针对内容的中枢过程,也没有为执行这些过程而确定相应的特定神经结构。我们现在所知道的一切都与中枢问题解决由等势神经机制辅助的说法相一致。如果你假设中枢认知过程主要是奎因式的和各向同性的,那么这正是你所期望的。
To put these claims in a nutshell; there are no content-specific central processes for the performance of which correspondingly specific neural structures have been identified. Everything we now know is compatible with the claim that central problem-solving is subserved by equipotential neural mechanisms. This is precisely what you would expect if you assume that the central cognitive processes are largely Quineian and isotropic.
1.当然,这只是一种理想化的想法;关于相信什么(无论是非主观的还是其他的)的决定通常无法充分利用现有数据。然而,这种考虑并不影响本文的观点,即此类决定必须对来自许多不同来源的信息敏感。
1. This is, of course, an idealization; decisions about what to believe (subdoxastically or otherwise) do not, in general, succeed in making the optimal use of the available data. This consideration does not, however, affect the present point, which is just that such decisions must, of necessity, be sensitive to information from many different sources.
2.这种论证方式背后有一个假设,读者可能不愿意承认:垂直能力之间的接口机制必须是计算性的,而不是人们所说的机械性的。关于语言如何与感知相连的旧观点(例如,感知是图像,而文字是其关联物)隐含地否定了这一假设。然而,在我看来,任何认真思考在决定(例如)如何表达我们所看到的东西时必须涉及什么的人都会接受这种观点的合理性,即所涉及的心理过程必须是计算性的,并且具有巨大的复杂性。
2. There is an assumption underlying this line of argument which the reader may not wish to grant: that the mechanisms that interface between vertical faculties have to be computational rather than, as one might say, merely mechanical. Old views of how language connects with perception (e.g., percepts are pictures and words are their associates) implicitly deny this assumption. It seems to me, however, that anyone who thinks seriously about what must be involved in deciding (e.g.) how to say what we see will accept the plausibility of the view that the mental processes that are implicated must be both computational and of formidable complexity.
3 . 编者注:这些属性在(Fodor,1983)的第三部分中定义。Fodor 认为输入系统是特定于领域的、强制性的,只向中央系统提供有限的信息,信息封装,具有“浅层”输出,并具有典型的故障模式。
3. Editors’ note; these properties are defined in Part III of (Fodor, 1983). Fodor argues that input systems are domain-specific, mandatory, provide only limited information to the central system, are informationally encapsulated, have ‘shallow’ outputs, and have characteristic breakdown patterns.
4.更强有力的版本认为,每个理论陈述在逻辑上必须等同于一些(有限的?)观察陈述的结合。有关此文献的详细评论,请参阅 Glymour (1980)。Glymour 对 Quineian 确认论的某些方面持异议,但原因与我们在此讨论的无关。
4. Stronger versions had it that each theory statement must be logically equivalent to some (finite?) conjunction of observation statements. For a sophisticated review of this literature, see Glymour (1980). Glymour takes exception to some aspects of the Quineian account of confirmation, but not for reasons that need concern us here.
5.人们经常提出(例如参见 McCarthy,1980),能够处理框架问题的逻辑必须是“非单调的”。(粗略地说,当增加新的公设不会减少先前可推导的定理集时,逻辑是单调的;否则,逻辑是非单调的。)关键在于,新的信念不会直接添加到旧信念中;相反,旧信念会以各种方式改变以适应新的信念。然而,根据文中提出的框架问题的分析,这并不奇怪。因为,就此而言,框架问题与非演示确认问题没有区别,确认关系本身通常是非单调的。例如,新数据的可用性可能需要为无限多的先前接受的假设分配新的确认级别。因此,如果我们将确认系统视为形式化的,那么每当有新数据可用时,无限多的先前可推导出的形式为“H 的水平是 L”的公式可能会变成非定理。
5. It is often proposed (see, e.g., McCarthy, 1980) that a logic capable of coping with the frame problem will have to be ‘nonmonotonic’. (Roughly, a logic is monotonic when the addition of new postulates does not reduce the set of previously derivable theorems; nonmonotonic otherwise.) The point is that new beliefs don’t just get added on to the old set; rather, old beliefs are variously altered to accommodate the new ones. This is, however, hardly surprising on the analysis of the frame problem proposed in the text. For, on that account, the frame problem is not distinguishable from the problem of nondemonstrative confirmation, and confirmation relations are themselves typically nonmonotonic. For example, the availability of a new datum may necessitate the assignment of new confirmation levels to indefinitely many previously accepted hypotheses. Hence, if we think of the confirmation system as formalized, indefinitely many previously derivable formulas of the form ‘the level of H is L’ may become nontheorems whenever new data become available.
6.由于框架问题和框架兼数据结构之间没有特别的关系,因此该领域的命名法非常令人困惑。
6. Since there is no particular relation between the frame problem and frames-cum-data structures, the nomenclature in this area could hardly be more confusing.
7.定位之争当然并未随着 Gall 和 Flourens 的终结而结束。有关其相对较近的历史(自 Wernicke 以来)的简要概述,请参阅 Eggert (1977)。顺便说一句,Wernicke 固然在语言机制方面是坚定的定位主义者,但他坚持认为只有“基本功能……才能归属于特定区域。所有超出这些基本功能的过程(例如将各种感知综合为概念以及诸如思维和意识之类的复杂功能)都依赖于连接皮层不同区域的纤维束”(第 92 页),这一点颇有趣味。除了联想论之外,Wernicke 的图景与我们在此阐述的图景并无太大区别。
7. The localization dispute didn’t, of course, end with Gall and Flourens. For a useful, brief survey of its relatively modem history (since Wernicke), see Eggert (1977). It is of some interest—in passing—that Wernicke, committed localizationalist though he was in respect of the language mechanisms, held that only “primary functions…can be referred to specific areas. All processes which exceed these primary functions (such as the synthesis of various perceptions into concepts and the complex functions such as thought and consciousness) are dependent upon the fiber bundles connecting different areas of the cortex” (p. 92). Barring the associationism, Wernicke’s picture is not very different from the one that we’ve been developing here.
梅兰妮·米切尔
Melanie Mitchell
2023
2023
自 20 世纪 50 年代诞生以来,人工智能领域经历了多次轮回:乐观预测和大规模投资的时期(“人工智能之春”),失望、失去信心和资金减少的时期(“人工智能之冬”)。尽管如今人工智能的突破似乎进展迅速,但自动驾驶汽车、家政机器人和对话式伴侣等长期备受期待的技术的开发却比许多人预期的要困难得多。这些反复出现的周期的原因之一是我们对智能本身的性质和复杂性的理解有限。在本章中,我将描述人工智能研究人员常见假设中的四个谬误,这些谬误可能导致对该领域的预测过于自信。最后,我将讨论这些谬误引发的未决问题,包括让机器具有类似人类的常识这一古老的挑战。
Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between periods of optimistic predictions and massive investment (“AI spring”) and periods of disappointment, loss of confidence, and reduced funding (“AI winter”). Even with today’s seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected. One reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. In this chapter, I describe four fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field. I conclude by discussing the open questions spurred by these fallacies, including the age-old challenge of imbuing machines with humanlike common sense.
2020 年本应是自动驾驶汽车的时代。五年前,《卫报》的头条新闻预测“从 2020 年开始,你将永远成为后座司机”(Adams,2015 年)。2016 年,Business Insider向我们保证“到 2020 年,将有 1000 万辆自动驾驶汽车上路”(Business Insider Intelligence,2016 年)。特斯拉汽车公司首席执行官埃隆·马斯克在 2019 年承诺“一年后,我们将拥有超过 100 万辆完全自动驾驶、软件……一切齐全的汽车”(Hawkins,2019 年)。2020 年是几家汽车公司宣布将自动驾驶汽车推向市场的目标(McCormick,2017 年;Kageyama,2015 年)。
The year 2020 was supposed to herald the arrival of self-driving cars. Five years earlier, a headline in The Guardian predicted that “from 2020 you will become a permanent backseat driver” (Adams, 2015). In 2016, Business Insider assured us that “10 million self-driving cars will be on the road by 2020” (Business Insider Intelligence, 2016). Tesla Motors CEO, Elon Musk, promised in 2019 that “a year from now, we’ll have over a million cars with full self-driving, software…everything” (Hawkins, 2019). And 2020 was the target announced by several automobile companies to bring self-driving cars to market (McCormick, 2017; Kageyama, 2015).
尽管有人试图重新定义“全自动驾驶” (Baldwin, 2021),但这些预测都没有实现。值得引用人工智能专家 Drew McDermott 的话,当对人工智能系统(尤其是自动驾驶汽车)的过度乐观被证明是错误时会发生什么:
Despite attempts to redefine “full self-driving” into existence (Baldwin, 2021), none of these predictions has come true. It’s worth quoting AI expert Drew McDermott on what can happen when over-optimism about AI systems—in particular, self-driving cars—turns out to be wrong:
也许期望太高了……最终会导致灾难。[假设]五年后,由于自动驾驶汽车无法行驶,[资金]惨遭腰斩。每家初创公司都失败了。而且会有很大反弹,以至于你无法为与人工智能相关的任何项目获得资金。每个人都匆忙将他们的研究项目名称更改为其他名称。这种情况被称为“人工智能寒冬”。(McDermott 等人,1985 年)
Perhaps expectations are too high, and…this will eventually result in disaster. [S]uppose that five years from now [funding] collapses miserably as autonomous vehicles fail to roll. Every startup company fails. And there’s a big backlash so that you can’t get money for anything connected with AI. Everybody hurriedly changes the names of their research projects to something else. This condition [is] called the “AI Winter.” (McDermott et al., 1985)
最值得注意的是,麦克德莫特的警告是在 1985 年发出的,当时,就像今天一样,人工智能领域对机器智能的近期前景充满信心和乐观。麦克德莫特写的是该领域的周期性模式。新的、明显的突破将使人工智能从业者预测快速的进步、成功的商业化和“真正的人工智能”的近期前景。政府和公司将陷入这种热情之中,并将为该领域提供大量研发资金。人工智能的春天将会盛开。当进展停滞时,热情、资金和工作机会就会枯竭。人工智能的冬天将会到来。事实上,在麦克德莫特发出警告大约五年后,一个新的人工智能冬天来临了。
What’s most notable is that McDermott’s warning is from 1985, when, like today, the field of AI was awash with confident optimism about the near future of machine intelligence. McDermott was writing about a cyclical pattern in the field. New, apparent breakthroughs would lead AI practitioners to predict rapid progress, successful commercialization, and the near-term prospects of “true AI.” Governments and companies would get caught up in the enthusiasm, and would shower the field with research and development funding. AI spring would be in bloom. When progress stalled, the enthusiasm, funding, and jobs would dry up. AI winter would arrive. Indeed, about five years after McDermott’s warning, a new AI winter set in.
在本章中,我将探讨人工智能的过度自信和失望这一循环往复的原因。我认为,公众、媒体甚至专家的过度乐观可能源于我们谈论人工智能的方式以及我们对智能本质的直觉中的几种谬误。了解这些谬误及其微妙的影响可能有助于指导创建更强大、更值得信赖甚至可能真正智能的人工智能系统。
In this chapter, I explore the reasons for the repeating cycle of overconfidence followed by disappointment in expectations about AI. I argue that over-optimism among the public, the media, and even experts can arise from several fallacies in how we talk about AI and in our intuitions about the nature of intelligence. Understanding these fallacies and their subtle influences may help guide the creation of more robust, trustworthy, and perhaps actually intelligent AI systems.
对人工智能过度自信的预测与该领域本身一样古老。例如,1958 年,《纽约时报》报道了美国海军对弗兰克·罗森布拉特 (Frank Rosenblatt) 的“感知器”(当今深度神经网络的前身)的演示:“海军今天展示了电子计算机的雏形,它希望它能够行走、说话、看、写、自我复制,并意识到自己的存在”(《纽约时报》,1958 年)。这种乐观的看法很快得到了人工智能先驱们的类似宣言,这一次是关于基于逻辑的“符号”人工智能的前景。1960 年,赫伯特·西蒙宣称“二十年内,机器将能够做人类可以做的任何工作”(Simon,1960 年)。次年,克劳德·香农(Claude Shannon)也做出了同样的预测:“我确信,在 10 到 15 年内,实验室中将会出现一些与科幻小说中机器人相差无几的东西”(IEEE 信息理论学会,2016 年)。几年后,马文·明斯基(Marvin Minsky)预测:“在一代人的时间内……创造‘人工智能’的问题将得到实质性解决”(明斯基,1967 年)。
Overconfident predictions about AI are as old as the field itself. In 1958, for example, the New York Times reported on a demonstration by the US Navy of Frank Rosenblatt’s “perceptron” (a rudimentary precursor to today’s deep neural networks): “The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself, and be conscious of its existence” (The New York Times, 1958). This optimistic take was quickly followed by similar proclamations from AI pioneers, this time about the promise of logic-based “symbolic” AI. In 1960 Herbert Simon declared that “machines will be capable, within twenty years, of doing any work that a man can do” (Simon, 1960). The following year, Claude Shannon echoed this prediction: “I confidently expect that within a matter of 10 or 15 years, something will emerge from the laboratory which is not too far from the robot of science fiction fame” (IEEE Information Theory Society, 2016). And a few years later Marvin Minsky forecast that “within a generation…the problems of creating ‘artificial intelligence’ will be substantially solved” (Minsky, 1967).
这些预测反映出 20 世纪 60 年代和 70 年代初人工智能的乐观春天,但很快迎来了第一个人工智能寒冬。明斯基和帕普特在 1969 年出版的《感知器》(Minsky and Papert,1969)一书中指出,罗森布拉特的感知器能够解决的问题种类非常有限。1973 年,英国的莱特希尔报告(Lighthill,1973)和美国国防部的“美国研究小组”报告(由各自政府委托,旨在评估人工智能在不久的将来的前景)都对人工智能的前景持极为悲观的态度。这导致两国对人工智能的资助急剧减少,热情也随之下降。
The optimistic AI spring of the 1960s and early 1970s, reflected in these predictions, soon gave way to the first AI winter. Minsky and Papert’s 1969 book Perceptrons (Minsky and Papert, 1969) showed that the kinds of problems solvable by Rosenblatt’s perceptrons were very limited. In 1973 the Lighthill Report (Lighthill, 1973) in the UK and the Department of Defense’s “American Study Group” report in the US, commissioned by their respective governments to assess prospects for AI in the near future, were both extremely negative about those prospects. This led to sharp funding decreases and a downturn in enthusiasm for AI in both countries.
从 20 世纪 80 年代初开始,人工智能再次经历了热情的高涨,当时出现了几项新举措:工业领域“专家系统”的兴起(Durkin,1996);日本对“第五代”项目(Gaines,1984)进行了巨额投资,该项目雄心勃勃,旨在将人工智能能力作为新一代计算系统的核心;美国随之发起“战略计算计划”(Stefik,1985),为通用人工智能的发展提供了大量资金;以及在神经网络方面进行的一系列新努力(McClelland 等,1986a、1986b),为该领域带来了新的希望。
AI once again experienced an upturn in enthusiasm starting in the early 1980s with several new initiatives: the rise of “expert systems” in industry (Durkin, 1996); Japan’s huge investment in its “Fifth Generation” project (Gaines, 1984), which aimed for ambitious AI abilities as the core of a new generation of computing systems; the US’s responding “Strategic Computing Initiative” (Stefik, 1985), which provided large funding for progress into general AI; as well as a new set of efforts on neural networks (McClelland et al., 1986a,b), which generated new hopes for the field.
到 20 世纪 80 年代后期,这些乐观的希望全部破灭了;同样,这些技术都没有实现所做出的崇高承诺。专家系统依靠人类制定规则来捕捉特定领域的专家知识,结果却很脆弱——也就是说,在面对新情况时,它们往往无法概括或适应。问题在于,编写规则的人类专家实际上依赖于潜意识知识——我们可以称之为“常识”——而这并不是系统编程的一部分。第五代项目和战略计算计划所追求的人工智能方法也遇到了类似的脆弱性和缺乏通用性的问题。20 世纪 80 年代和 90 年代的神经网络方法同样在相对简单的例子上效果很好,但缺乏扩展到复杂问题的能力。事实上,20 世纪 80 年代末标志着新一轮人工智能寒冬的开始,该领域的声誉受损。 1990 年我获得博士学位时,有人建议我不要将“人工智能”一词用于求职申请中。
By the latter part of the 1980s, these optimistic hopes had all been dashed; again, none of these technologies had achieved the lofty promises that had been made. Expert systems, which rely on humans to create rules that capture expert knowledge of a particular domain, turned out to be brittle—that is, often unable to generalize or adapt when faced with new situations. The problem was that the human experts writing the rules actually rely on subconscious knowledge—what we might call “common sense”—that was not part of the system’s programming. The AI approaches pursued under the Fifth Generation project and Strategic Computing Initiative ran into similar problems of brittleness and lack of generality. The neural-network approaches of the 1980s and 1990s likewise worked well on relatively simple examples but lacked the ability to scale up to complex problems. Indeed, the late 1980s marked the beginning of a new AI winter, and the field’s reputation suffered. When I received my PhD in 1990, I was advised not to use the term “Artificial Intelligence” on my job applications.
在 1956 年启动该领域的达特茅斯夏季研讨会 50 周年纪念活动上,人工智能先驱、最初创造“人工智能”一词的约翰·麦卡锡 (John McCarthy) 简明扼要地解释了这一问题:“人工智能比我们想象的要难”(Moewes and Nürnberger,2013)。
At the 50th anniversary commemoration of the 1956 Dartmouth Summer Workshop that launched the field, AI pioneer John McCarthy, who had originally coined the term “Artificial Intelligence,” explained the issue succinctly: “AI was harder than we thought” (Moewes and Nürnberger, 2013).
20 世纪 90 年代和 21 世纪见证了机器学习的迅猛发展:从数据中创建预测模型的算法。机器学习算法通常受到统计学而非神经科学或心理学的启发,旨在执行特定任务而非捕获一般智能。机器学习从业者通常很快就能将他们的学科与当时名誉扫地的人工智能领域区分开来。
The 1990s and 2000s saw the meteoric rise of machine learning: algorithms that create predictive models from data. Machine learning algorithms were typically inspired by statistics rather than by neuroscience or psychology, and were aimed at performing specific tasks rather than capturing general intelligence. Machine-learning practitioners were often quick to differentiate their discipline from the then-discredited field of AI.
然而,在 2010 年左右,深度学习(一种受大脑启发的多层神经网络,通过数据进行训练)从落后地位脱颖而出,成为机器学习领域的超级明星。深度神经网络自 20 世纪 70 年代就已出现,但直到最近,由于从网络上抓取的大量数据集、快速并行计算芯片以及训练方法的创新,这些方法才得以扩展,足以解决大量以前未解决的人工智能难题。深度神经网络是我们在过去十年中看到的所有重大人工智能进步的动力,包括语音识别、机器翻译、聊天机器人、图像识别、游戏和蛋白质折叠等。
However, around 2010, deep learning—in which brain-inspired multilayered neural networks are trained from data—emerged from its backwater position and rose to superstar status in machine learning. Deep neural networks had been around since the 1970s, but only recently, due to huge datasets scraped from the Web, fast parallel computing chips, and innovations in training methods, could these methods scale up enough to address a large number of previously unsolved AI challenges. Deep neural networks are what power all of the major AI advances we’ve seen in the past decade, including speech recognition, machine translation, chat bots, image recognition, game playing, and protein folding, among others.
突然间“人工智能”一词开始随处出现,人们对被称为“通用人工智能”、“真正人工智能”或“人类水平人工智能”的人工智能的前景突然燃起新一轮的乐观情绪。
Suddenly the term “AI” started to appear everywhere, and there was all at once a new round of optimism about the prospects of what has been variously called “general,” “true,” or “human-level” AI.
在 2016 年和 2018 年对人工智能研究人员进行的调查中,受访者的预测中值为 50%,即到 2040-2060 年,人类级别的人工智能将诞生,尽管人们对较早和较晚的估计意见不一(Müller 和 Bostrom,2016 年;Grace 等人,2018 年)。甚至一些最著名的人工智能专家和企业家也持相同观点。一本广泛使用的人工智能教科书的合著者 Stuart Russell 预测,“超级智能人工智能”将“很可能在我孩子的有生之年出现”(Russell,2019b 年),而人工智能公司 OpenAI 的首席执行官 Sam Altman 预测,几十年内,计算机程序“将做几乎所有事情,包括做出新的科学发现,从而扩展我们对‘一切’的概念”(Altman,2021 年)。谷歌 DeepMind 联合创始人 Shane Legg 在 2008 年预测“人工智能将在 2020 年代中期超越人类水平”(Despres, 2008),Facebook 首席执行官马克·扎克伯格在 2015 年宣称“Facebook 未来五到十年的目标之一,就是让人工智能在所有主要人类感官上都超越人类水平:视觉、听觉、语言、一般认知”(McCracken, 2015)。
In surveys of AI researchers carried out in 2016 and 2018, the median prediction of those surveyed gave a 50 percent chance that human-level AI would be created by 2040–2060, though there was much variance of opinion, both for sooner and later estimates (Müller and Bostrom, 2016; Grace et al., 2018). Even some of the most well-known AI experts and entrepreneurs are in accord. Stuart Russell, co-author of a widely used textbook on AI, predicts that “superintelligent AI” will “probably happen in the lifetime of my children” (Russell, 2019b) and Sam Altman, CEO of the AI company OpenAI, predicts that within decades, computer programs “will do almost everything, including making new scientific discoveries that will expand our concept of ‘everything’” (Altman, 2021). Shane Legg, co-founder of Google DeepMind, predicted in 2008 that “human level AI will be passed in the mid-2020s” (Despres, 2008), and Facebook’s CEO, Mark Zuckerberg, declared in 2015 that “one of [Facebook’s] goals for the next five to ten years is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition” (McCracken, 2015).
然而,尽管人们对深度学习抱有乐观态度,但没过多久,深度学习的智能外表就出现了裂痕。事实证明,与过去的所有人工智能系统一样,深度学习系统也存在脆弱性——在面对与训练数据不同的情况时,会出现不可预测的错误。这是因为这类系统容易受到捷径学习的影响(Geirhos 等人,2020 年;Lapuschkin 等人,2019 年):学习训练数据中的统计关联,使机器能够给出正确的答案,但有时是出于错误的原因。换句话说,这些机器不会学习我们试图教给它们的概念,而是学习捷径来纠正训练集上的答案——而这种捷径不会导致良好的概括。事实上,深度学习系统通常无法学习抽象概念,而这些抽象概念可以让它们将所学的知识迁移到新的情况或任务中(Mitchell,2021 年)。此外,此类系统容易受到“对抗性扰动”(Moosavi-Dezfooli 等,2017)的攻击——对输入进行特殊设计的改变,这些改变人类无法察觉或无关,但会导致系统出错。
However, in spite of all the optimism, it didn’t take long for cracks to appear in deep learning’s façade of intelligence. It turns out that, like all AI systems of the past, deep-learning systems can exhibit brittleness—unpredictable errors when facing situations that differ from the training data. This is because such systems are susceptible to shortcut learning (Geirhos et al., 2020; Lapuschkin et al., 2019): learning statistical associations in the training data that allow the machine to produce correct answers but sometimes for the wrong reasons. In other words, these machines don’t learn the concepts we are trying to teach them, but rather they learn shortcuts to correct answers on the training set—and such shortcuts will not lead to good generalizations. Indeed, deep-learning systems often cannot learn the abstract concepts that would enable them to transfer what they have learned to new situations or tasks (Mitchell, 2021). Moreover, such systems are vulnerable to attack from “adversarial perturbations” (Moosavi-Dezfooli et al., 2017)—specially engineered changes to the input that are either imperceptible or irrelevant to humans but that induce the system to make errors.
尽管对深度神经网络的局限性进行了广泛的研究,但其脆弱性和易受攻击性的根源仍未完全了解。这些网络具有大量参数,是复杂的系统,其决策机制可能非常不透明。然而,从它们非人类般的错误和易受对抗性干扰的影响来看,这些系统实际上并没有理解它们处理的数据,至少不是人类意义上的“理解”。人工智能界仍在争论这种理解是否可以通过增加网络层和更多训练数据来实现,或者是否缺少一些更基本的东西。
Despite extensive research on the limitations of deep neural networks, the sources of their brittleness and vulnerability are still not completely understood. These networks, with their large number of parameters, are complicated systems whose decision-making mechanisms can be quite opaque. However, it seems clear from their non-humanlike errors and vulnerability to adversarial perturbations that these systems are not actually understanding the data they process, at least not in the human sense of “understand.” It’s still a matter of debate in the AI community whether such understanding can be achieved by adding network layers and more training data, or whether something more fundamental is missing.
在撰写本文时(2021 年中),几种新的深度学习方法再次在 AI 社区中引起了相当大的乐观情绪。一些最热门的新领域包括使用自监督(或“预测”)学习的 Transformer 架构(Devlin 等人,2018 年)、元学习(Finn 等人,2017 年)和深度强化学习(Arulkumaran 等人,2017 年);这些领域都被认为是朝着更通用、更像人类的 AI 迈出的进步。虽然这些和其他新创新已经显示出初步的希望,但 AI 的春天和冬天的周期可能会继续。该领域在相对狭窄的领域不断进步,但通往人类水平的 AI 的道路并不那么明朗。
At the time of writing (mid-2021), several new deep-learning approaches are once again generating considerable optimism in the AI community. Some of the hottest new areas are transformer architectures using self-supervised (or “predictive”) learning (Devlin et al., 2018), meta-learning (Finn et al., 2017), and deep reinforcement learning (Arulkumaran et al., 2017); each of these has been cited as progress towards more general, humanlike AI. While these and other new innovations have shown preliminary promise, the AI cycle of springs and winters is likely to continue. The field continually advances in relatively narrow areas, but the path toward human-level AI is less clear.
在接下来的几节中,我将论证,对人类级人工智能可能出现时间线的预测反映了我们自身的偏见和对智能本质的缺乏理解。特别是,我将描述我们在思考人工智能时犯下的四个错误,这些错误在我看来最为重要。虽然这些错误在人工智能界广为人知,但专家做出的许多假设仍然受这些错误的影响,并让我们对“真正”智能机器的近期前景产生了错误的信心。
In the next sections, I will argue that predictions about the likely timeline of human-level AI reflect our own biases and lack of understanding of the nature of intelligence. In particular, I describe four fallacies in our thinking about AI that seem most central to me. While these fallacies are well-known in the AI community, many assumptions made by experts still fall victim to these fallacies and give us a false sense of confidence about the near-term prospects of “truly” intelligent machines.
特定 AI 任务的进展通常被描述为迈向更通用 AI 的“第一步”。会下棋的计算机 Deep Blue“被誉为 AI 革命的第一步”(Aron,2016 年)。IBM 将其 Watson 系统描述为“迈向认知系统的第一步,开启计算的新时代”(High,2013 年)。OpenAI 的 GPT-3 语言生成器被称为“迈向通用智能的一步”(Alexander,2019 年)。
Advances on a specific AI task are often described as “a first step” towards more general AI. The chess-playing computer Deep Blue “was hailed as the first step of an AI revolution” (Aron, 2016). IBM described its Watson system as “a first step into cognitive systems, a new era of computing” (High, 2013). OpenAI’s GPT-3 language generator was called a “step toward general intelligence” (Alexander, 2019).
事实上,如果人们看到一台机器做了一些了不起的事情,尽管是在很小的领域,他们通常会认为这个领域在通用人工智能方面已经取得了很大进展。哲学家休伯特·德雷福斯(使用了由耶霍舒亚·巴尔-希勒尔创造的一个术语)将此称为“第一步谬误”。正如德雷福斯所描述的那样,“第一步谬误是声称,自从我们第一次研究计算机智能以来,我们一直在沿着一个连续体缓慢前进,而这个连续体的末端就是人工智能,因此,我们程序中的任何改进,无论多么微不足道,都算作进步。”德雷福斯引用了他兄弟、工程师斯图尔特·德雷福斯的一个比喻:“这就像声称第一只爬树的猴子正在朝着登月的方向前进”(德雷福斯,2012 年)。
Indeed, if people see a machine do something amazing, albeit in a narrow area, they often assume the field is that much further along toward general AI. The philosopher Hubert Dreyfus (using a term coined by Yehoshua Bar-Hillel) called this a “first-step fallacy.” As Dreyfus characterized it, “The first-step fallacy is the claim that, ever since our first work on computer intelligence we have been inching along a continuum at the end of which is AI so that any improvement in our programs no matter how trivial counts as progress.” Dreyfus quotes an analogy made by his brother, the engineer Stuart Dreyfus: “It was like claiming that the first monkey that climbed a tree was making progress towards landing on the moon” (Dreyfus, 2012).
和他之前和之后的许多人工智能专家一样,德雷福斯指出,在假定的人工智能进步连续体中,“意想不到的障碍”始终是常识问题。我将在最后一节更多地谈论这个常识障碍。
Like many AI experts before and after him, Dreyfus noted that the “unexpected obstacle” in the assumed continuum of AI progress has always been the problem of common sense. I will say more about this barrier of common sense in the last section.
约翰·麦卡锡哀叹“人工智能比我们想象的要难”,而马文·明斯基则解释说,这是因为“简单的事情很难”(明斯基,1986 年)。也就是说,我们人类不费吹灰之力就能做到的事情——观察世界并理解我们所看到的东西、进行对话、在拥挤的人行道上行走而不撞到任何人——对机器来说却是最困难的挑战。相反,让机器做对人类来说非常困难的事情往往更容易;例如,解决复杂的数学问题、掌握国际象棋和围棋等游戏以及在数百种语言之间翻译句子,这些对机器来说都相对容易。这是一种所谓的“莫拉维克悖论”,以机器人专家汉斯·莫拉维克的名字命名,他写道:“让计算机在智力测试或下跳棋时表现出成人水平的表现相对容易,而让它们在感知和移动方面拥有一岁儿童的技能则很难或不可能”(莫拉维克,1988 年)。
While John McCarthy lamented that “AI was harder than we thought,” Marvin Minsky explained that this is because “easy things are hard” (Minsky, 1986). That is, the things that we humans do without much thought—looking out in the world and making sense of what we see, carrying on a conversation, walking down a crowded sidewalk without bumping into anyone—turn out to be the hardest challenges for machines. Conversely, it’s often easier to get machines to do things that are very hard for humans; for example, solving complex mathematical problems, mastering games like chess and Go, and translating sentences between hundreds of languages have all turned out to be relatively easier for machines. This is a form of what’s been called “Moravec’s paradox,” named after roboticist Hans Moravec, who wrote, “It is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility” (Moravec, 1988).
这种谬论自人工智能诞生之初就一直影响着人们对人工智能的思考。人工智能先驱赫伯特·西蒙宣称:“认知中所有有趣的事情都发生在 100 毫秒以上——也就是你认出母亲的时间”(霍夫施塔特,1985 年)。西蒙说,要理解认知,我们不必担心无意识的感知过程。这一假设反映在大多数符号人工智能传统中,这些传统侧重于对已经被感知的输入进行推理的过程。
This fallacy has influenced thinking about AI since the dawn of the field. AI pioneer Herbert Simon proclaimed, “Everything of interest in cognition happens above the 100-millisecond level—the time it takes you to recognize your mother” (Hofstadter, 1985). Simon is saying that, to understand cognition, we don’t have to worry about unconscious perceptual processes. This assumption is reflected in most of the symbolic AI tradition, which focuses on the process of reasoning about input that has already been perceived.
在过去的几十年里,符号人工智能方法在研究界已经失宠,而研究界主要被深度学习所主导,而深度学习确实解决了感知问题。然而,这种谬误背后的假设仍然出现在最近关于人工智能的说法中。例如,在 2016 年的一篇文章中,深度学习先驱 Andrew Ng 被引用来呼应 Simon 的假设,大大低估了无意识感知和思维的复杂性:“如果一个普通人可以在不到一秒钟的时间内完成一项心理任务,那么我们现在或在不久的将来可能就可以使用人工智能将其自动化”(Ng,2016 年)。
In the last decades, symbolic AI approaches have lost favor in the research community, which has largely been dominated by deep learning, which does address perception. However, the assumptions underlying this fallacy still appear in recent claims about AI. For example, in a 2016 article, deep-learning pioneer Andrew Ng was quoted echoing Simon’s assumptions, vastly underestimating the complexity of unconscious perception and thought: “If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future” (Ng, 2016).
更微妙的是,谷歌 DeepMind 的研究人员在谈论 AlphaGo 的胜利时,将围棋游戏描述为“最具挑战性的领域”之一(Silver 等人,2017 年)。对谁来说具有挑战性?对人类来说,也许如此,但正如心理学家 Gary Marcus 指出的那样,有些领域,包括游戏,虽然对人类来说很容易,但对人工智能系统来说却比围棋更具挑战性。一个例子是猜字谜游戏,它“需要表演技巧、语言技巧和心理理论”(Marcus,2018b),这些能力远远超出了当今人工智能所能完成的任何能力。
More subtly, researchers at Google DeepMind, in talking about AlphaGo’s triumph, described the game of Go as one of “the most challenging of domains” (Silver et al., 2017). Challenging for whom? For humans, perhaps, but as psychologist Gary Marcus pointed out, there are domains, including games, that, while easy for humans, are much more challenging than Go for AI systems. One example is charades, which “requires acting skills, linguistic skills, and theory of mind” (Marcus, 2018b), abilities that are far beyond anything AI can accomplish today.
人工智能比我们想象的要难,因为我们在很大程度上没有意识到我们自己的思维过程的复杂性。汉斯·莫拉维克这样解释他的悖论:“在人类大脑中高度进化的大型感觉和运动部分中,编码了十亿年关于世界本质以及如何在其中生存的经验。深思熟虑的过程我认为,我们称之为推理的思维是人类思想中最薄弱的一层,它之所以有效,只是因为它得到了这种更古老、更强大、但通常是无意识的感觉运动知识的支持。我们都是感知和运动领域的天才,我们是如此优秀,以至于我们让困难的事情看起来很容易”(Moravec,1988 年)。或者更简洁地说,马文·明斯基指出,“一般来说,我们最不了解我们的大脑最擅长什么”(明斯基,1980 年)。
AI is harder than we think because we are largely unconscious of the complexity of our own thought processes. Hans Moravec explains his paradox this way: “Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much more powerful, though usually unconscious, sensorimotor knowledge. We are all prodigious Olympians in perceptual and motor areas, so good that we make the difficult look easy” (Moravec, 1988). Or more succinctly, Marvin Minsky notes, “In general, we’re least aware of what our minds do best” (Minsky, 1980).
“一厢情愿的助记符”这一术语是由计算机科学家德鲁·麦克德莫特 (Drew McDermott) 在 1976 年对人工智能的批评中创造的:
The term “wishful mnemonic” was coined in a 1976 critique of AI by computer scientist Drew McDermott:
人工智能程序思维简单的一个主要原因是使用“理解”或“目标”等助记符来指代程序和数据结构。……如果研究人员……将他的程序的主循环称为“理解”,那么他(在被证明无罪之前)只是在回避问题。他可能会误导很多人,最明显的是他自己。……他应该做的是将这个主循环称为“G0034”,看看他是否能说服自己或其他人,G0034 实现了理解的某些部分。……一旦你明白这一点,就会想到许多人工智能研究人员一厢情愿的助记符的有益例子。(McDermott,1976 年)
A major source of simple-mindedness in AI programs is the use of mnemonics like “UNDERSTAND” or “GOAL” to refer to programs and data structures.…If a researcher…calls the main loop of his program “UNDERSTAND,” he is (until proven innocent) merely begging the question. He may mislead a lot of people, most prominently himself.…What he should do instead is refer to this main loop as “G0034,” and see if he can convince himself or anyone else that G0034 implements some part of understanding.…Many instructive examples of wishful mnemonics by AI researchers come to mind once you see the point. (McDermott, 1976)
如今,几十年过去了,人工智能研究充斥着此类一厢情愿的助记符——与人类智能相关的术语,用于描述人工智能程序的行为和评估。神经网络大致受到大脑的启发,但又有很大不同。机器学习或深度学习方法与人类(或非人类动物)的学习并不十分相似。事实上,如果机器已经学会了人类意义上的学习,我们会期望它能够在不同的环境中运用它所学到的东西。然而,事实证明情况往往并非如此。在机器学习中,有一个完整的子领域称为迁移学习,它专注于如何让机器将所学知识迁移到新情况中这一尚未解决的问题,这是人类学习的基础能力。
Now, many decades later, work on AI is replete with such wishful mnemonics—terms associated with human intelligence that are used to describe the behavior and evaluation of AI programs. Neural networks are loosely inspired by the brain, but with vast differences. Machine learning or deep learning methods do not really resemble learning in humans (or in non-human animals). Indeed, if a machine has learned something in the human sense of learn, we would expect that it would be able to use what it has learned in different contexts. However, it turns out that this is often not the case. In machine learning, there is an entire subfield called transfer learning that focuses on the still-open problem of how to enable machines to transfer what they have learned to new situations, an ability that is fundamental to human learning.
事实上,我们谈论机器能力的方式影响了我们对这些能力究竟有多普遍的认识。IBM 的一位高管无意中在现实世界中证实了麦克德莫特的警告,他宣称“沃森可以在几秒钟内阅读世界上所有的医疗保健文本”(Gustin,2011)。DeepMind 联合创始人 Demis Hassabis 告诉我们,“AlphaGo 的目标是击败最优秀的人类玩家,而不仅仅是模仿他们”(Ji-hye,2016)。AlphaGo 的首席研究员 David Silver 如此描述该程序的其中一场比赛:“我们总是可以在比赛中询问 AlphaGo 它认为自己表现如何。……直到比赛快结束时,AlphaGo 才认为自己会赢”(Shead,2017)。(以上引文中的重点是我加的。)
Indeed, the way we talk about machine abilities influences our conceptions of how general those abilities really are. Unintentionally providing real-world illustrations of McDermott’s warning, one of IBM’s top executives proclaimed that “Watson can read all of the health-care texts in the world in seconds” (Gustin, 2011). DeepMind co-founder Demis Hassabis tells us that “AlphaGo’s goal is to beat the best human players, not just mimic them” (Ji-hye, 2016). And AlphaGo’s lead research David Silver described one of the program’s matches thus: “We can always ask AlphaGo how well it thinks it’s doing during the game.…It was only towards the end of the game that AlphaGo thought it would win” (Shead, 2017). (Emphasis is mine in the quotations above.)
有人可能会说,这种拟人化术语只是一种简写:IBM 科学家知道 Watson 无法像人类一样阅读或理解;DeepMind 科学家知道 AlphaGo 不像人类那样有目标或想法,也没有像人类一样的“游戏”或“获胜”概念。然而,这种简写可能会误导试图理解这些结果的公众(以及报道这些结果的媒体),甚至可能在不知不觉中影响人工智能专家对其系统的看法,以及这些系统与人类智能的相似程度。
One could argue that such anthropomorphic terms are simply shorthand: IBM scientists know that Watson doesn’t read or understand in the way humans do; DeepMind scientists know that AlphaGo has no goals or thoughts in the way humans do and no humanlike conceptions of a “game” or of “winning.” However, such shorthand can be misleading to the public trying to understand these results (and to the media reporting on them) and can also unconsciously shape the way even AI experts think about their systems and how closely these systems resemble human intelligence.
McDermott 的“一厢情愿的助记符”指的是我们用来描述人工智能程序的术语,但研究界也会使用一厢情愿的助记符来根据我们希望它们测试的技能来命名人工智能评估基准。例如,以下是人工智能子领域“自然语言处理”(NLP)中一些被广泛引用的当前基准:“斯坦福问答数据集”(Rajpurkar 等人,2016 年)、“RACE 阅读理解数据集”(Lai 等人,2017 年)和“通用语言理解评估”(Wang 等人,2019 年)。在所有这些基准测试中,最佳机器的性能已经超过了人类(通常是 Amazon Mechanical Turk 工人)的性能。这导致了这样的头条新闻:“新的人工智能模型在回答问题方面超越了人类的表现”(Costenaro,2018 年);“计算机的阅读能力越来越强”(Pham,2018 年);以及“微软的 AI 模型在自然语言理解方面表现优于人类”(Jawad,2021 年)。考虑到这些基准评估的名称,人们得出这样的结论也就不足为奇了。问题是,这些基准实际上并没有衡量问答、阅读理解或自然语言理解的一般能力。基准测试只测试了这些能力的非常有限的版本;此外,正如我上面所描述的,其中许多基准允许机器学习捷径——机器可以利用统计相关性在测试中取得高分,而无需学习实际被测试的技能(McCoy 等人,2019 年;Linzen,2020 年)。虽然机器可以在这些特定的基准测试中胜过人类,但 AI 系统还远远达不到我们与基准名称联系起来的更一般的人类能力。
McDermott’s “wishful mnemonics” referred to terms we use to describe AI programs, but the research community also uses wishful mnemonics in naming AI evaluation benchmarks after the skills we hope they test. For example, here are some of the most widely cited current benchmarks in the subarea of AI called “natural-language processing” (NLP): the “Stanford Question Answering Dataset” (Rajpurkar et al., 2016), the “RACE Reading Comprehension Dataset” (Lai et al., 2017), and the “General Language Understanding Evaluation” (Wang et al., 2019). In all of these benchmarks, the performance of the best machines has already exceeded that measured for humans (typically Amazon Mechanical Turk workers). This has led to headlines such as “New AI model exceeds human performance at question answering” (Costenaro, 2018); “Computers are getting better than humans at reading” (Pham, 2018); and “Microsoft’s AI model has outperformed humans in natural-language understanding” (Jawad, 2021). Given the names of these benchmark evaluations, it’s not surprising that people would draw such conclusions. The problem is, these benchmarks don’t actually measure general abilities for question-answering, reading comprehension, or natural-language understanding. The benchmarks test only very limited versions of these abilities; moreover, many of these benchmarks allow machines to learn shortcuts, as I described above—statistical correlations that machines can exploit to achieve high performance on the test without learning the actual skill being tested (McCoy et al., 2019; Linzen, 2020). While machines can outperform humans on these particular benchmarks, AI systems are still far from matching the more general human abilities we associate with the benchmarks’ names.
智力是一种不同于身体的东西,无论是作为非物质物质还是仅依赖于大脑,这种观点在哲学和认知科学中有着悠久的历史。
The idea that intelligence is something distinct from the body, whether as a non-physical substance or as dependent only on the brain, has a long history in philosophy and cognitive science.
所谓的“心智信息处理模型”兴起于二十世纪中叶的心理学中。该模型将心智视为一种计算机,可以输入、存储、处理和输出信息。除了输入(感知)和输出(行为)阶段外,身体不会发挥太大作用。根据这种观点,认知完全发生在大脑中,理论上可以与身体的其他部分分离。这种观点的一个极端推论是,未来我们将能够将我们的大脑(以及我们的认知和意识)“上传”到计算机(Woollaston,2013 年)。
The so-called “information-processing model of mind” arose in psychology in the mid-twentieth century. This model views the mind as a kind of computer, which inputs, stores, processes, and outputs information. The body does not play much of a role except in the input (perception) and output (behavior) stages. Under this view, cognition takes place wholly in the brain and is, in theory, separable from the rest of the body. An extreme corollary of this view is that, in the future, we will be able to “upload” our brains—and thus our cognition and consciousness—to computers (Woollaston, 2013).
几乎所有关于人工智能的研究都隐含着这样一个假设:智能原则上可以“脱离肉体”。早期人工智能研究中最有影响力的思想之一是纽厄尔和西蒙的“物理符号系统假说”(PSSH),该假说指出:“物理符号系统具有实现一般智能行为的必要和充分手段”(Newell and Simon,1976 年)。“物理符号系统”一词指的是与数字计算机非常相似的东西。PSSH 假设,数字计算机无需结合任何非符号的大脑或身体过程即可实现一般智能。(有关符号过程与亚符号过程的深入讨论,请参阅霍夫施塔特(1985 年)的《从布尔梦中醒来》。)
The assumption that intelligence can in principle be “disembodied” is implicit in almost all work on AI throughout its history. One of the most influential ideas in early AI research was Newell and Simon’s “Physical Symbol System Hypothesis” (PSSH), which stated: “A physical symbol system has the necessary and sufficient means for general intelligent action” (Newell and Simon, 1976). The term “physical symbol system” refers to something much like a digital computer. The PSSH posits that general intelligence can be achieved in digital computers without incorporating any non-symbolic processes of brain or body. (For an insightful discussion of symbolic versus subsymbolic processes, see Hofstadter (1985)’s “Waking Up from the Boolean Dream.”)
纽厄尔和西蒙的 PSSH 是人工智能符号方法的创始原则,在 20 世纪 90 年代和 21 世纪统计和神经启发式机器学习兴起之前,符号方法一直主导着人工智能领域。然而,这些非符号方法也不认为身体与智能相关。相反,从 20 世纪 80 年代的联结主义到当今的深度神经网络,神经启发式方法通常认为智能完全源于大脑结构和动态。当今的深度神经网络类似于众所周知的缸中之脑:被动地从世界获取数据并输出行为指令,而不主动与任何身体进行互动。当然,机器人和自动驾驶汽车的不同之处在于它们在世界上有物理存在,但迄今为止,它们之间的物理互动类型以及对“智能”的反馈都相当有限。
Newell and Simon’s PSSH was a founding principle of the symbolic approach to AI, which dominated the field until the rise of statistical and neurally inspired machine learning in the 1990s and 2000s. However, these non-symbolic approaches also do not view the body as relevant to intelligence. Instead, neurally inspired approaches from 1980s connectionism to today’s deep neural networks generally assume that intelligence arises solely from brain structures and dynamics. Today’s deep neural networks are akin to the proverbial brain-in-a-vat: passively taking in data from the world and outputting instructions for behavior without actively interacting in the world with any kind of body. Of course, robots and autonomous vehicles are different in that they have a physical presence in the world, but to date the kinds of physical interactions they have, and the feedback to their “intelligence,” is quite limited.
智能完全存在于大脑中的假设引发了人们的猜测:要实现人类级别的人工智能,我们只需将机器规模扩大到与大脑的“计算能力”相匹配,然后为这种与大脑相匹配的“硬件”开发适当的“软件”。例如,一位哲学家撰写了一份文献报告,结论是:“我认为 10 15 FLOP/s 足以像人脑一样执行任务(只要有合适的软件,这可能很难创建)”(Carlsmith,2020 年)。不需要任何人!
The assumption that intelligence is all in the brain has led to speculation that, to achieve human-level AI, we simply need to scale up machines to match the brain’s “computing capacity” and then develop the appropriate “software” for this brain-matching “hardware.” For example, one philosopher wrote a report on the literature that concluded, “I think it more likely than not that 1015 FLOP/s is enough to perform tasks as well as the human brain (given the right software, which may be very hard to create)” (Carlsmith, 2020). No body needed!
顶级人工智能研究人员也赞同这一观点,即扩大硬件规模以匹配大脑将实现人类级别的人工智能。例如,深度学习先驱杰弗里·辛顿 (Geoffrey Hinton) 预测,“要以人类的水平理解 [文档],我们可能需要人类级别的资源,而我们的大脑中有数万亿个连接。……但迄今为止我们建立的最大网络只有数十亿个连接。所以我们还差了几个数量级,但我相信硬件人员会解决这个问题”(Patterson 和 Gibson,2017 年)。其他人预测,“硬件修复”——最终实现人类级别人工智能的速度和内存容量——将以量子计算机的形式出现(Musser,2018 年)。
Top AI researchers have echoed the idea that scaling up hardware to match the brain will enable human-level artificial intelligence. For example, deep-learning pioneer Geoffrey Hinton predicted, “To understand [documents] at a human level, we’re probably going to need human-level resources and we have trillions of connections [in our brains].…But the biggest networks we have built so far only have billions of connections. So we’re a few orders of magnitude off, but I’m sure the hardware people will fix that” (Patterson and Gibson, 2017). Others have predicted that the “hardware fix”—the speed and memory capacity to finally enable human-level AI—will come in the form of quantum computers (Musser, 2018).
然而,越来越多的研究人员开始质疑“一切都在大脑中”的信息处理模型在理解智能和创造人工智能方面的基础。计算机科学家罗德·布鲁克斯认为,“我们之所以长期被困在死胡同里,是因为摩尔定律一直在推动我们前进,我们一直在想,‘哦,我们在进步,我们在进步,我们在进步。’但也许我们并没有进步。”(布鲁克斯,2019 年)。事实上,几十年来,许多认知科学家一直主张身体在所有认知活动中的核心地位。这些观点的一位著名支持者、心理学家马克·约翰逊 (Mark Johnson) 写道,一项关于具身认知的研究计划在 20 世纪 70 年代中期开始兴起,它“开始提供越来越多的证据,证明我们的大脑和身体在我们经历、思考和做的一切事情中都发挥着核心作用。”(约翰逊,2017 年)。心理学家丽贝卡·芬奇-基弗 (Rebecca Fincher-Kiefer) 这样描述具身认知范式:“具身认知意味着概念知识的表征依赖于身体:它是多模态的……,而不是非模态的、符号的或抽象的。这一理论表明,我们的思想是有根基的,或与感知、行动和情感密不可分,我们的大脑和身体共同作用产生认知”(Fincher-Kiefer,2019)。
However, a growing cadre of researchers is questioning the basis of the “all in the brain” information-processing model for understanding intelligence and for creating AI. Writing about what he calls “the cul-de-sac of the computational metaphor,” computer scientist Rod Brooks argues, “The reason for why we got stuck in this cul-de-sac for so long was because Moore’s law just kept feeding us, and we kept thinking, ‘Oh, we’re making progress, we’re making progress, we’re making progress.’ But maybe we haven’t been” (Brooks, 2019). In fact, a number of cognitive scientists have argued for decades for the centrality of the body in all cognitive activities. One prominent proponent of these ideas, the psychologist Mark Johnson, writes of a research program on embodied cognition, gaining steam in the mid-1970s, that “began to provide converging evidence for the central role of our brains and bodies in everything we experience, think, and do” (Johnson, 2017). Psychologist Rebecca Fincher-Kiefer characterizes the embodied cognition paradigm this way: “Embodied cognition means that the representation of conceptual knowledge is dependent on the body: it is multimodal…, not amodal, symbolic, or abstract. This theory suggests that our thoughts are grounded, or inextricably associated with, perception, action, and emotion, and that our brain and body work together to have cognition” (Fincher-Kiefer, 2019).
具身认知的证据来自多种学科。例如,神经科学的研究表明,控制认知的神经结构与控制感觉和运动系统的神经结构有着密切的联系,抽象思维利用了基于身体的神经“地图”(Epstein 等人,2017 年)。正如神经科学家唐·塔克 (Don Tucker) 所说,“大脑没有用于非具身认知的部分”(Tucker,2007 年)。认知心理学和语言学的结果表明,我们的许多(如果不是全部的话)抽象概念都植根于物理的、基于身体的内部模型(Barsalou 和 Wiemer-Hastings,2005 年),这部分是由日常语言中基于身体的隐喻系统揭示的(Lakoff 和 Johnson,2008 年)。
The evidence for embodied cognition comes from a diverse set of disciplines. Research in neuroscience suggests, for example, that the neural structures controlling cognition are richly linked to those controlling sensory and motor systems, and that abstract thinking exploits body-based neural “maps” (Epstein et al., 2017). As neuroscientist Don Tucker noted, “There are no brain parts for disembodied cognition” (Tucker, 2007). Results from cognitive psychology and linguistics indicate that many, if not all, of our abstract concepts are grounded in physical, body-based internal models (Barsalou and Wiemer-Hastings, 2005), revealed in part by the systems of physically based metaphors found in everyday language (Lakoff and Johnson, 2008).
其他几个学科,如发展心理学,也为具身认知提供了证据。然而,人工智能研究大多忽略了这些结果,尽管有一小部分研究人员在“具身人工智能”、“发展机器人”和“扎实语言理解”等子领域探索这些想法。
Several other disciplines, such as developmental psychology, add to evidence for embodied cognition. However, research in AI has mostly ignored these results, though there is a small group of researchers exploring these ideas in subareas known as “embodied AI,” “developmental robotics,” and “grounded language understanding,” among others.
与具身认知理论相关的是,伴随我们深度社交生活的情绪和“非理性”偏见(通常被认为与智力无关,或妨碍理性)实际上是实现智力的关键。人工智能通常被认为是一种“纯智能”,一种独立于情绪、非理性和身体限制(如进食和睡眠需求)的智能。这种对纯理性智能可能性的假设可能导致我们对未来“超级智能”机器将面临的风险做出耸人听闻的预测。
Related to the theory of embodied cognition is the idea that the emotions and the “irrational” biases that go along with our deeply social lives—typically thought of as separate from intelligence, or as getting in the way of rationality—are actually key to what makes intelligence possible. AI is often thought of as aiming at a kind of “pure intelligence,” one that is independent of emotions, irrationality, and constraints of the body such as the need to eat and sleep. This assumption of the possibility of a purely rational intelligence can lead to lurid predictions about the risks we will face from future “superintelligent” machines.
例如,哲学家尼克·博斯特罗姆 (Nick Bostrom) 断言,系统的智能和其目标是正交的;他认为“任何级别的智能都可以与任何最终目标相结合”(Bostrom, 2014, 105)。例如,博斯特罗姆设想了一个假想的超级智能 AI 系统,其唯一目标是生产回形针;这个假想系统的超级智能使人们能够发明出生产回形针的巧妙方法,并在生产过程中耗尽地球的所有资源。
For example, the philosopher Nick Bostrom asserts that a system’s intelligence and its goals are orthogonal; he argues that “any level of intelligence could be combined with any final goal” (Bostrom, 2014, 105). As an example, Bostrom imagines a hypothetical superintelligent AI system whose sole objective is to produce paperclips; this imaginary system’s superintelligence enables the invention of ingenious ways to produce paperclips, and uses up all of the Earth’s resources in doing so.
人工智能研究员 Stuart Russell 同意 Bostrom 关于智能和目标正交性的观点。“很容易想象,通用智能系统可以或多或少地被赋予任何目标,包括最大化回形针的数量或已知的圆周率位数”(Russell,2019b,167)。Russell 担心使用这种超级智能来解决人类问题可能产生的结果:“如果一个超级智能气候控制系统的任务是将二氧化碳浓度恢复到工业化前的水平,而它认为解决方案是将人类人口减少到零,那会怎样?……如果我们将错误的目标输入机器,而它比我们更聪明,我们就会失败”(Russell,2019a)。
AI researcher Stuart Russell concurs with Bostrom on the orthogonality of intelligence and goals. “It is easy to imagine that a general-purpose intelligent system could be given more or less any objective to pursue, including maximizing the number of paper clips or the number of known digits of pi” (Russell, 2019b, 167). Russell worries about the possible outcomes of employing such a superintelligence to solve humanity’s problems: “What if a superintelligent climate control system, given the job of restoring carbon dioxide concentrations to preindustrial levels, believes the solution is to reduce the human population to zero?…If we insert the wrong objective into the machine and it is more intelligent than us, we lose” (Russell, 2019a).
博斯特罗姆和拉塞尔提出的思想实验似乎假设人工智能系统可以在没有任何基本人类常识的情况下实现“超级智能”,同时无缝保留计算机的速度、精度和可编程性。但这些关于超人人工智能的推测受到对智能本质的错误直觉的困扰。我们的心理学或神经科学知识中没有任何内容支持“纯粹理性”可以与塑造我们的认知和目标的情感和文化偏见分离的可能性。相反,我们从具身认知研究中了解到,人类智能似乎是一个高度集成的系统,具有紧密相连的属性,包括情感、欲望、强烈的自我意识和自主性,以及对世界的常识性理解。这些属性是否可以分离尚不清楚。
The thought experiments proposed by Bostrom and Russell seem to assume that an AI system could be “superintelligent” without any basic humanlike common sense, yet while seamlessly preserving the speed, precision, and programmability of a computer. But these speculations about superhuman AI are plagued by flawed intuitions about the nature of intelligence. Nothing in our knowledge of psychology or neuroscience supports the possibility that “pure rationality” is separable from the emotions and cultural biases that shape our cognition and our objectives. Instead, what we’ve learned from research in embodied cognition is that human intelligence seems to be a strongly integrated system with closely interconnected attributes, including emotions, desires, a strong sense of selfhood and autonomy, and a commonsense understanding of the world. It’s not at all clear that these attributes can be separated.
我所描述的四种谬误揭示了我们对人工智能现状的概念化存在缺陷,以及我们对智能本质的有限直觉。我认为这些谬误至少在一定程度上解释了为什么在机器中捕捉人类智能总是比我们想象的要难。
The four fallacies I have described reveal flaws in our conceptualizations of the current state of AI and our limited intuitions about the nature of intelligence. I have argued that these fallacies are at least in part why capturing humanlike intelligence in machines always turns out to be harder than we think.
这些谬论给人工智能研究人员提出了几个问题。我们如何评估“通用”或“人类水平”人工智能的实际进展?我们如何评估人工智能在特定领域相对于人类的难度?我们应该如何描述人工智能系统的实际能力,而不是用一厢情愿的助记符欺骗自己和他人?人类认知的各个维度(包括认知偏见、情感、目标和体现)在多大程度上可以解开?我们如何改善对智能的直觉?
These fallacies raise several questions for AI researchers. How can we assess actual progress toward “general” or “human-level” AI? How can we assess the difficulty of a particular domain for AI as compared with humans? How should we describe the actual abilities of AI systems without fooling ourselves and others with wishful mnemonics? To what extent can the various dimensions of human cognition (including cognitive biases, emotions, objectives, and embodiment) be disentangled? How can we improve our intuitions about what intelligence is?
这些问题仍未得到解决。显然,为了更有效地推动和评估人工智能的发展,我们需要开发更好的词汇来谈论机器能做什么。更广泛地说,我们需要更好地科学地理解智能,因为它体现在自然界的不同系统中。这将要求人工智能研究人员更深入地参与研究智能的其他科学学科。
These questions remain open. It’s clear that to make and assess progress in AI more effectively, we will need to develop a better vocabulary for talking about what machines can do. And more generally, we will need a better scientific understanding of intelligence as it manifests in different systems in nature. This will require AI researchers to engage more deeply with other scientific disciplines that study intelligence.
常识是智能的一个方面,最近,这一概念推动了人工智能研究人员与来自其他多个学科的认知科学家之间的合作,尤其是认知发展 (例如,参见 Turek, 2018)。在人工智能的历史上,人们曾多次尝试赋予机器类似人类的常识1,从 John McCarthy (McCarthy, 1986) 和 Douglas Lenat (Lenat et al., 1990) 的基于逻辑的方法,到当今基于深度学习的方法 (例如,Zellers et al., 2019)。人工智能研究员 Oren Etzioni 将“常识”称为“人工智能的暗物质”,并指出,“它有点难以言喻,但你会看到它对一切事物的影响”(Knight, 2018)。这一术语已成为当今最先进的人工智能系统所缺失内容的某种概括(Davis 和 Marcus,2015;Levesque,2017)。虽然常识包括我们人类对世界的大量知识,但它还需要能够利用这些知识来识别和预测我们遇到的情况,并指导我们在这些情况下的行动。赋予机器常识将需要向它们灌输人类婴儿所拥有的关于空间、时间、因果关系以及无生命物体和其他生命体的性质的最基本的“核心”知识,也许是与生俱来的知识(Spelke 和 Kinzler,2007),从细节抽象到一般概念的能力,以及从先前经验中进行类比的能力。目前还没有人知道如何在机器中捕捉这些知识或能力。这是人工智能研究的当前前沿,一个令人鼓舞的前进方向是利用已知的关于幼儿这些能力的发展情况。有趣的是,这正是艾伦·图灵在 1950 年发表的引入图灵测试的论文中推荐的方法。图灵问道:“与其尝试编写一个模拟成人思维的程序,为什么不尝试编写一个模拟儿童思维的程序呢?”(图灵,1950 年)
The notion of common sense is one aspect of intelligence that has recently been driving collaborations between AI researchers and cognitive scientists from several other disciplines, particularly cognitive development (e.g., see Turek, 2018). There have been many attempts in the history of AI to give humanlike common sense to machines1 , ranging from the logic-based approaches of John McCarthy (McCarthy, 1986) and Douglas Lenat (Lenat et al., 1990) to today’s deep-learning-based approaches (e.g., Zellers et al., 2019). “Common sense” is what AI researcher Oren Etzioni called “the dark matter of artificial intelligence,” noting, “It’s a little bit ineffable, but you see its effects on everything” (Knight, 2018). The term has become a kind of umbrella for what’s missing from today’s state-of-the-art AI systems (Davis and Marcus, 2015; Levesque, 2017). While common sense includes the vast amount of knowledge we humans have about the world, it also requires being able to use that knowledge to recognize and make predictions about the situations we encounter and to guide our actions in those situations. Giving machines common sense will require imbuing them with the very basic “core,” perhaps innate, knowledge that human infants possess about space, time, causality, and the nature of inanimate objects and other living agents (Spelke and Kinzler, 2007), the ability to abstract from particulars to general concepts, and to make analogies from prior experience. No one yet knows how to capture such knowledge or abilities in machines. This is the current frontier of AI research, and one encouraging way forward is to tap into what’s known about the development of these abilities in young children. Interestingly, this was the approach recommended by Alan Turing in his 1950 paper that introduced the Turing test. Turing asks, “Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?” (Turing, 1950)
1892 年,心理学家威廉·詹姆斯 (William James) 曾这样评价当时的心理学:“这不是科学,而只是科学的希望”(James, 1892)。这句话完美地概括了当今的人工智能。事实上,一些研究人员将人工智能与中世纪的炼金术进行了类比。1977 年,人工智能研究员特里·维诺格拉德 (Terry Winograd) 写道:“在某些方面,[人工智能] 类似于中世纪的炼金术。我们正处于将不同的物质组合在一起并观察会发生什么的阶段,尚未开发出令人满意的理论……但……炼金术士的实践经验和好奇心提供了丰富的数据,从中可以开发出科学的化学理论”(Winograd, 1977)。四十年后,微软研究院院长埃里克·霍维茨 (Eric Horvitz) 也表示赞同:“现在,我们所做的不是一门科学,而是一种炼金术”(Metz, 2017)。为了理解人工智能真正进步的本质,特别是为什么它比我们想象的更难,我们需要从炼金术转向发展对智能的科学理解。
In 1892, the psychologist William James said of psychology at the time, “This is no science; it is only the hope of a science” (James, 1892). This is a perfect characterization of today’s AI. Indeed, several researchers have made analogies between AI and the medieval practice of alchemy. In 1977, AI researcher Terry Winograd wrote, “In some ways [AI] is akin to medieval alchemy. We are at the stage of pouring together different combinations of substances and seeing what happens, not yet having developed satisfactory theories…but…it was the practical experience and curiosity of the alchemists which provided the wealth of data from which a scientific theory of chemistry could be developed” (Winograd, 1977). Four decades later, Eric Horvitz, director of Microsoft Research, concurred: “Right now, what we are doing is not a science but a kind of alchemy” (Metz, 2017). In order to understand the nature of true progress in AI, and in particular, why it is harder than we think, we need to move from alchemy to developing a scientific understanding of intelligence.
1.有些人质疑我们为什么需要机器具有像人类一样的认知,但如果我们希望机器在人类世界中与我们一起工作,我们就需要它们拥有与我们自身思维基础相同的关于世界的基本知识。
1. Some have questioned why we need machines to have humanlike cognition, but if we want machines to work with us in our human world, we will need them to have the same basic knowledge about the world that is the foundation of our own thinking.
机器是否仅仅表现得好像它们与世界保持着联系?或者我们能否确定它们也会思考事物——它们会生成具有内容的状态,而这些内容指的是世界上的物体?人类思维的一个惊人之处在于,我们可以表征世界,既可以表征世界的本来面目,也可以表征世界的可能面目。我们可以看到山峰;我们可以渴望攀登它;即使山峰不在视野范围内,我们也可以想起它;我们可以想象它可能与实际情况有微妙甚至巨大的不同(例如,由不同种类的岩石构成,或拥有不同的动植物);我们甚至可以想象在水晶喷泉旁的群山,母鸡在那里产下半熟的鸡蛋。即使我们没有意识到,我们也会对同一个物体产生矛盾的想法:一个登山者可能渴望攀登乔戈里峰,但又不想攀登乔戈里峰,尽管乔戈里峰就是乔戈里峰。弗朗茨·布伦塔诺 (Franz Brentano) 在他的《从经验主义角度看的心理学》 (1874/1973)中首次引起了人们对这组现象的关注。它后来被称为意向性。事实上,布伦塔诺将意向性描述为精神的标志:一种只有精神状态才具有的特征。
Do machines merely act as if they are in touch with the world? Or can we be sure that they also have thoughts about things—that they generate states with content, which refer to objects in the world? One of the striking things about human minds is that we can represent the world, both as it is and as it might be. We can see the mountain peak; we can desire to climb it; we can call it to mind even when it’s not in view; we can imagine that it might be subtly or even vastly different than it in fact is (made of different kinds of rock, for example, or housing different flora and fauna); and we can even imagine mountains, beside crystal fountains, where the hens lay soft-boiled eggs. We can have conflicting thoughts about one and the same object, even if we don’t realize it: a mountain climber might desire to climb K2 and have no desire to scale Chogori, even though Chogori is K2. Franz Brentano first called attention to this set of phenomena in his Psychology from an Empirical Standpoint (1874/1973). It has come to be called intentionality. In fact, Brentano described intentionality as the mark of the mental: a feature had by all and only mental states.
经过适当编程的计算机是否也具有意向性?从表面上看,计算机具有“代表”其他事物的内部状态似乎是显而易见和微不足道的。根据第一部分中的某些观点,计算机是根据其操纵符号的能力来定义的,而符号通常代表事物。但在这里,我们必须小心,不要混淆状态可以代表其他事物的两种重要不同含义。一方面,我们可以将路标、书中的文字或图表上的线条说成是“代表”其他事物,即有人可以看到它们并解读它们的含义。这被称为衍生意向性,正是因为它的存在需要观察者为原本无意识的实体(标志上的油漆、页面上的印刷品等)提供意义。路标、书籍和图表本身并不知道它们的含义。相比之下,原始意向性是持续挑战的根源:我们如何才能构建系统,让世界本身就具有意义,而不是由解释者提供意义?更一般地讲,物理世界如何才能包含以这种更强的意义来代表事物的系统?本部分的文章阐述了人工智能 (AI) 面临的这一挑战,并对可能的解决方案进行了限制。
Could an appropriately programmed computer also have intentionality? On the face of it, it might seem obvious and trivial that computers have internal states that “represent” other things. According to some of the views in part I, computers are defined in terms of their ability to manipulate symbols, and symbols usually represent things. But here we must be careful not to confuse two importantly different senses in which states can be about other things. On the one hand, we can talk about road signs, words in a book, or lines on a graph as “representing” other things, in the sense that someone could look at them and decipher their meanings. This has come to be called derivative intentionality, precisely because it requires for its existence an observer who supplies meaning to what would otherwise be mindless entities (paint on a sign, print on a page, etc.). The road signs, books, and graphs don’t themselves have a clue of what they are about. Original intentionality, in contrast, is the source of the persistent challenge: How can we build systems that imbue the world with meaning themselves rather than having it supplied by an interpreter? More generally, how can a physical world contain systems that represent things in this stronger sense? The essays in this part articulate this challenge for artificial intelligence (AI) and place constraints on possible solutions.
在第 11 章中,丹尼特认为,不应从系统的组成部分或它们如何相互作用的角度来理解意向性,而应从组织中产生的行为的复杂性的角度来理解。根据他的说法,意向系统是这样一种系统,我们通过采取“意向立场”获得相当大的预测优势;也就是说,通过将系统视为具有信念、欲望、情感(等等),并预测系统将根据这些信念和欲望或多或少地理性地行事。这种观点将意向系统与仅仅是生物或物理的系统区分开来(例如)有些模糊。对于某些人来说,丹尼特的观点似乎将系统的意向性留在“旁观者的眼睛”中,因此提出了一个问题:物理世界如何可能首先包含采取立场的生物。事实上,人们可能会认为,按照第 2 章中 Haugeland 的说法,意向性就是采取立场的问题。如果是这样,它就不能简化为采取立场。
In chapter 11, Dennett argues that intentionality should not be understood in terms of the components of a system or how they interact, but in terms of the complexity of behavior that arises from their organization. An intentional system, by his account, is a system for which we gain considerable predictive leverage by taking the “intentional stance”; that is, by treating the system as if it has beliefs, desires, emotions (etc.) and predicting that the system will behave more or less rationally in light of those beliefs and desires. This view leaves the line between intentional systems and those that are merely biological or physical (for example) somewhat blurry. For some, Dennett’s view seems to leave the intentionality of the system in the “eye of the beholder,” and so raises the question of how it is possible for the physical world to contain creatures that take stances in the first place. Indeed, one might think, following Haugeland in chapter 2, that intentionality is a matter of taking a stance. If so, it can’t be reduced to stance-taking.
在第 12 章中,Searle 指出,理性可描述的行为,或者任何单纯的符号操作本身,都不足以体现意向性。他的“中文房间”思想实验已成为意向性挑战的领域定义性表达。它据称表明,即使一个使用复杂符号操作通过图灵测试的系统实际上也可能不理解它在说什么。他考虑了许多应对这一挑战的尝试,但最终都拒绝了,他最终认为最初的意向性(或理解)源自尚未发现的大脑独特生化特性。
In chapter 12, Searle argues that rationally describable action, or, indeed, any mere symbol manipulation by itself, will not suffice for intentionality. His Chinese Room thought experiment has become a field-defining articulation of the challenge of intentionality. It putatively shows that even a system that passes the Turing test using sophisticated symbol manipulation might not, in fact, understand what it is talking about. He considers many attempts to meet that challenge and rejects them all, holding ultimately that original intentionality (or understanding) emerges from the as-yet-undiscovered, unique biochemical properties of the brain.
在第 13 章中,博登为后来被称为“系统回应”的 Searle 论证进行了辩护。根据这种回应风格,Searle 的论证调查了一个据称具有意向性的系统的各个部分,在其各个部分中没有发现意向性,并错误地得出结论,认为整个系统不了解正在发生的事情。从这个角度来看,Searle 犯了一种组合谬误:假设整体的属性必须由其各个组成部分单独拥有。正如下文所述,这一回应引起了相当大的争议。
In chapter 13, Boden defends what has come to be called the “systems reply” to Searle’s argument. According to this style of reply, Searle’s argument investigates the parts of a purportedly intentional system, finds no intentionality in its individual parts, and wrongly concludes that the system as a whole does not understand what is going on. Searle, in this view, commits a kind of compositional fallacy: assuming that the property of the whole must be possessed by its components taken individually. This reply has been the subject of considerable controversy, as discussed next.
与这种反对意见一致,弗朗西斯·伊根在第 14 章中指出,计算系统是可定义的,可以严格根据其实例化的数学函数进行个体化。尽管如此,当我们用计算组件来解释系统解决问题的能力时,解释分为两部分:一部分是用计算系统的数学函数来狭义地描述计算系统;另一部分以明确计算机状态如何跟踪世界特征的方式“阐释”计算描述,以便在执行数学函数时解决问题。当然,塞尔认为也存在这样的阐释,并否认这足以满足原始意向性。额外的数学结构是否足以应对挑战仍是一个悬而未决的问题。
Consistent with this kind of objection, Francis Egan argues in chapter 14 that computational systems are definable and can be individuated strictly on the basis of the mathematical functions that they instantiate. Nonetheless, when we explain a capacity of a system to solve a problem in terms of its computational components, the explanation has two parts: one that describes the computational system narrowly in terms of the mathematical function that it computes, and one that “glosses” that computational description in ways that make explicit how the states of the computer track features of the world such that, in executing the mathematical function, it would solve the problem. Yet, of course, Searle thinks that there is such a gloss too and denies that this suffices for original intentionality. Whether the presence of additional mathematical structure is sufficient to address the challenge remains an open question.
意向性是贯穿当代心灵哲学的一条主线,那些希望深入研究意向性的人会发现本书的其他部分也有很多相关内容。Haugeland 的介绍文章(第 2 章)提出了意向性是认知科学的核心问题。此外,他的“心灵的体现和嵌入”(第 22 章)探讨了意向性可能由头脑以外的系统提供的可能性。Buckner 在讨论如何机械系统可能会执行不同类型的抽象(第 15 章)。而 Haas 对强化学习的讨论可能被视为迈出了第一步,以了解计算系统在乎什么意味着什么,这是 Haugeland 关于系统必须做什么才能采取立场的概念的一个关键组成部分(第 16 章)。
Intentionality is a thread running through much of contemporary philosophy of mind, and those wishing to assemble a deep focus on intentionality will find much relevant in other parts of this book. Haugeland’s introductory piece (chapter 2) raises intentionality as a central problem for cognitive science. Furthermore, his “Mind Embodied and Embedded” (chapter 22) explores the possibility that intentionality might be supplied by systems outside the head proper. Buckner takes on one aspect of intentionality in his discussion of how mechanical systems might perform different kinds of abstraction (chapter 15). And Haas’s discussion of reinforcement learning might be seen as taking some first steps to understand what it means for a computing system to give a damn, a key component in Haugeland’s conception of what a system would have to do in order to take stances (chapter 16).
历史。 那些希望更深入地了解意向性主题的历史和发展的人可能会考虑将以下文章作为切入点:
History. Those looking to read more deeply about the history and development of the topic of intentionality might consider the following essays as entry points:
自然化意向性。 科学自然主义者试图通过展示意向性可以用科学解释的熟悉术语来解释,从而理解意向性:法律和规律、因果、结构和功能。自然主义方法中有各种重要分支:
Naturalizing Intentionality. The scientific naturalist seeks to understand intentionality by showing that it can be explained in the familiar terms of scientific explanation: of law and regularity, of cause and effect, of structure and function. There are various important strands in the naturalist approach:
还原论。 对于那些希望直接解决意向性问题的人来说,一个常见的策略是试图将意向性内容还原为某种自然关系,如相关性、因果关系、反事实依赖性或选择。
Reductionism. For those who wish to attack the problem of intentionality head on, one common strategy is to try to reduce intentional content to some kind of natural relation such as correlation, causation, counterfactual dependence, or selection.
内容外在论。 最后,心理内容哲学中一个重要的争议是,意义是否应该在“头脑中”寻找,作为内部处理状态的特征和关系,还是应该在“世界中”寻找。要理解意义在事物因果结构中的位置,也许有必要从计算系统之外寻找,并研究它所嵌入的外部系统(无论是因果关系还是其他关系)。在某种程度上,这里的问题是:对内部因果关系(包括信息操纵)的关注能解释多少关于心灵的事情?这些解释的哪些部分位于这些系统之外?
Content Externalism. Finally, one significant dispute in the philosophy of mental content has concerned whether meanings are properly to be sought “in the head,” as features and relations among internal processing states, or rather, “in the world.” To understand the way that meaning is situated in the causal structure of things, it might be useful to look outside the computational system and into the external systems in which it is embedded, causally or otherwise. In a way, the questions here are: How much about the mind can an attention to internal causal commerce, including the manipulation of information, explain? What parts of that explanation are housed outside these systems?
意向性的社会来源。 人们可能并不希望在机器(例如计算机和机器)的内部因果结构中寻找意向性,也不希望将其视为该事物与世界上的物体之间因果相互作用的产物,而是希望将意向性理解为一种社会现象,或者更确切地说,是个人与一套社会规范和实践之间的关系。根据这种观点,意向性不是大脑的生物产物(如 Searle 所言),也不仅仅是大脑通过学习和进化而选择的事实(尽管这肯定是故事的一部分,如 Millikan 所言)。相反,它是当人们学会按照规范概念使用的社会规则行事、说话和思考时获得的一种社会地位。要理解自然语言,就必须至少相当好地玩给出和询问理由的游戏(例如,参见 Haugeland 在其 (1998) 论文中的“意向性全明星”;以及 Brandom,1994)。原始意向性可以用社区成员执行的稳定倾向来解释,例如术语的正确使用、形成良好思想、推理的可允许性以及术语的扩展。中文房间中需要的理解最终基于遵守话语规则并要求其他人也做出承诺的倾向。有关这种观点的实施形式,请参阅 Gallagher 和 Miyahara (2012)。
Social Sources of Intentionality. Rather than seeking intentionality in the internal causal structure of a machine (such as a computer and a machine) or as the product of causal interactions between that thing and objects in the world, one might instead hope to understand intentionality as a social phenomenon, or more properly, of the relationship between an individual and a set of social norms and practices. According to this view, intentionality is not the biological product of a brain (as Searle suggests), nor is it merely the fact that they have been selected through learning and evolution (though that is surely part of the story, as Millikan would suggest). Rather, it is a social status achieved when one learns to behave and speak and think in conformity with social rules that govern the use of concepts. To understand a natural language is to play the game of giving and asking for reasons at least tolerably well (see, e.g., Haugeland’s “Intentionality Allstars” in his (1998) paper; as well as, Brandom, 1994). Original intentionality is explained in terms of stable dispositions that are enforced by members of community concerning, such as the proper use of terms, the formation of well-formed thoughts, the permissibility of inferences, and the extension of a term. Understanding of the sort wanted in the Chinese room is ultimately grounded in the dispositions to conform to the rules governing the discourse and to hold others to be committed as well. For an enactive form of this kind of view, see Gallagher and Miyahara (2012).
结论。 意向性这个话题也许是心灵哲学中一个核心的、尚未解决的问题,甚至在整个哲学中都不是。也许这个概述足以说明为什么这个问题在人工智能中如此棘手:关于意向性如何以因果结构为基础的最佳想法通常都存在众所周知的障碍,而且对于其中哪一个(如果有的话)最有可能,几乎没有共识。
Conclusion. The topic of intentionality is perhaps a core outstanding and unresolved issue in the philosophy of mind, if not in philosophy generally. And perhaps this sketch suffices to say something about why the matter is so intractable in AI: the best ideas for how intentionality might be grounded in causal structures quite generally all have well-known obstacles, and there is little consensus about which of them, if any, stands the best chance.
当然,在充满原因的世界中寻找意义是小菜一碟,但即使是如此单纯可爱的生物,究竟是如何赋予周围世界的因果结构意义的,包括其他人以及他们所做和所说的事情,这仍然是个谜。似乎有一种关于人、他们的理由、动机、目标和偏好、他们对周围世界的理解的解释方式,其运作方式似乎与因果机制的运作方式并不相符。这就是为什么很难将精神和世界的其他部分纳入概括的视野(Sellars 1965),为什么精神被称为“异常”(Davidson,1995),以及为什么有些人选择从我们对人的理解中“消除”命题思维和精神内容的概念(正如 Churchland,1981 或 Stich,1996 所辩护的那样)。作者揭示了这个有争议、困难且永远令人困惑的领域中的关键里程碑。
Finding meaning in a world of causes is child’s play, of course, but precisely how even such simple and loveable creatures imbue the causal structure of the world around them, including other people and the things they do and say, with meaning, remains mysterious. It appears there is a style of explanation concerning people, their reasons, their motives, their aims and preferences, their understanding of the world around them that operates according to principles that seem not to mirror the way causal mechanisms work. This is why it is difficult to bring the mental and the rest of the world into a synoptic view (Sellars 1965), why the mental is said to be “anomalous” (Davidson, 1995), and why some have opted instead to “eliminate” the idea of propositional thought and mental content from our understanding of persons (as defended by Churchland, 1981 or Stich, 1996). The authors reveal key landmarks in this contested and difficult, and perennially puzzling, terrain.
丹尼尔·C·丹尼特
Daniel C. Dennett
1981
1981
巴格达有个商人,他派仆人去集市买东西,不一会儿仆人回来了,脸色苍白,浑身发抖,他说:“主人,刚才我在集市上被人群中的一个女人推了一把,我转过身来,发现是死神在推我。她看着我,做了一个威胁的手势;现在,把你的马借给我,我要骑马离开这座城市,逃避我的命运。我要去萨迈拉,在那里死神不会找到我。”商人把自己的马借给他,仆人骑上马,用马刺踢了踢马的腰,马儿飞快地奔跑起来。然后商人下到集市,看到我站在人群中,他走到我面前,说:“你今天早上见到我的仆人时,为什么对他做出威胁的手势?”“那不是威胁的手势,”我说,“那只是惊讶的开始。我很惊讶在巴格达见到他,因为我今晚在萨迈拉与他有个约会。”
There was a merchant in Baghdad who sent his servant to market to buy provisions and in a little while the servant came back, white and trembling, and said: “Master, just now when I was in the market-place I was jostled by a woman in the crowd and when I turned I saw it was Death that jostled me. She looked at me and made a threatening gesture; now, lend me your horse, and I will ride away from this city and avoid my fate. I will go to Samarra and there Death will not find me.” The merchant lent him his horse, and the servant mounted it, and he dug his spurs in its flanks and as fast as the horse could gallop he went. Then the merchant went down to the market-place and he saw me standing in the crowd, and he came to me and said: “Why did you make a threatening gesture to my servant when you saw him this morning?” “That was not a threatening gesture,” I said, “it was only a start of surprise. I was astonished to see him in Baghdad, for I had an appointment with him tonight in Samarra.”
—W. 萨默塞特·毛姆,《死亡说话》
—W. Somerset Maugham, DEATH SPEAKS
在社会科学中,关于信仰的讨论无处不在。由于社会科学家通常对自己的研究方法很在意,因此关于信仰的讨论也很多。由于信仰是一种真正令人好奇和困惑的现象,向世界展现出许多不同的面貌,因此存在着大量争议。有时,信仰归因似乎是一件黑暗、危险和难以估量的事情——尤其是当奇异的、尤其是宗教或迷信的信仰成为众人瞩目的焦点时。这些并不是唯一麻烦的情况;当我们将信仰归因于非人类动物、婴儿、计算机或机器人时,我们也会招致争论和怀疑。或者当我们觉得必须归因于我们自己社会中一个看似健康的成年人的信仰相互矛盾,甚至完全错误时。我的一位生物学家同事曾经在酒吧里接到一个男人的电话,他想让他解决一个赌注。那个男人问:“兔子是鸟吗?”“不是,”生物学家说。“该死!”挂断电话时,那人说道。那么他真的会相信兔子是鸟吗?真的有人会相信吗?也许吧,但我们需要一点故事才能让我们接受它。
In the social sciences, talk about belief is ubiquitous. Since social scientists are typically self-conscious about their methods, there is also a lot of talk about talk about belief. And since belief is a genuinely curious and perplexing phenomenon, showing many different faces to the world, there is abundant controversy. Sometimes belief attribution appears to be a dark, risky, and imponderable business—especially when exotic, and more particularly religious or superstitious, beliefs are in the limelight. These are not the only troublesome cases; we also court argument and skepticism when we attribute beliefs to nonhuman animals, or to infants, or to computers or robots. Or when the beliefs we feel constrained to attribute to an apparently healthy adult member of our own society are contradictory, or even just wildly false. A biologist colleague of mine was once called on the telephone by a man in a bar who wanted him to settle a bet. The man asked: “Are rabbits birds?” “No” said the biologist. “Damn!” said the man as he hung up. Now could he really have believed that rabbits were birds? Could anyone really and truly be attributed that belief? Perhaps, but it would take a bit of a story to bring us to accept it.
在所有这些情况下,信念归因似乎都受到主观性的困扰,受到文化相对主义的影响,容易产生“彻底翻译的不确定性”——显然,这是一项需要特殊才能的事业:现象学分析、解释学、同理心、理解力等等。在其他场合,正常场合,当讨论熟悉的信念时,信念归因看起来就像说散文一样简单,像数盘子里的豆子一样客观可靠。特别是当这些简单的案例摆在我们面前时,我们完全可以假设,原则上(如果还没有在实践中)有可能通过在信徒的头脑中找到一些东西来证实这些简单、客观的信念归因——实际上,是通过找到信念本身。“看”,有人可能会说,“你要么相信冰箱里有牛奶,要么你不相信冰箱里有牛奶”(在后一种情况下,你可能没有意见)。但如果你确实相信这一点,那么这就是关于你的一个完全客观的事实,最终必须归结于你的大脑处于某种特定的物理状态。如果我们对生理心理学有更多的了解,原则上我们可以确定你的大脑状态的事实,从而确定你是否相信冰箱里有牛奶,即使你被决定对这个话题保持沉默或不诚实。原则上,根据这种观点,生理心理学可以胜过社会科学中任何“黑箱”方法的结果——或非结果——这些方法通过行为、文化、社会、历史、外部标准来推测信念(和其他心理特征) 。
In all of these cases, belief attribution appears beset with subjectivity, infected with cultural relativism, prone to “indeterminacy of radical translation”—clearly an enterprise demanding special talents: the art of phenomenological analysis, hermeneutics, empathy, Verstehen, and all that. On other occasions, normal occasions, when familiar beliefs are the topic, belief attribution looks as easy as speaking prose and as objective and reliable as counting beans in a dish. Particularly when these straightforward cases are before us, it is quite plausible to suppose that in principle (if not yet in practice) it would be possible to confirm these simple, objective belief attributions by finding something inside the believer’s head—by finding the beliefs themselves, in effect. “Look”, someone might say, “either you believe there’s milk in the fridge or you don’t believe there’s milk in the fridge” (you might have no opinion, in the latter case). But if you do believe this, that’s a perfectly objective fact about you, and it must come down in the end to your brain’s being in some particular physical state. If we knew more about physiological psychology, we could in principle determine the facts about your brain state and thereby determine whether or not you believe there is milk in the fridge, even if you were determined to be silent or disingenuous on the topic. In principle, on this view, physiological psychology could trump the results—or nonresults—of any “black box” method in the social sciences that divines beliefs (and other mental features) by behavioral, cultural, social, historical, external criteria.
这些不同的思考凝结成两种对立的观点,即关于信念归因的性质,从而关于信念的性质。后者是现实主义的一种,它将一个人是否有某种特定信念的问题比作一个人是否感染了某种病毒的问题——这是一个完全客观的内部事实,观察者往往可以对此做出非常可靠的有根据的猜测。前者,如果一定要给它起个名字的话,我们可以称之为解释主义,它将一个人是否有某种特定信念的问题比作一个人是否不道德、有风格、有才华或会成为一个好妻子的问题。面对这样的问题,我们在回答之前会说“嗯,这完全取决于你感兴趣的是什么”,或者对问题的相对性做出类似的承认。我们说:“这是一个解释的问题”。这两种对立的观点,如此直白地说,并不公平地代表任何严肃理论家的立场,但它们确实表达了通常被视为互相排斥和详尽无遗的观点;理论家必须对其中的一个且只有一个主题感兴趣。
These differing reflections congeal into two opposing views on the nature of belief attribution, and hence on the nature of belief. The latter, a variety of realism, likens the question of whether a person has a particular belief to the question of whether a person is infected with a particular virus—a perfectly objective internal matter of fact about which an observer can often make educated guesses of great reliability. The former, which we could call interpretationism if we absolutely had to give it a name, likens the question of whether a person has a particular belief to the question of whether a person is immoral, or has style, or talent, or would make a good wife. Faced with such questions, we preface our answers with “well, it all depends on what you’re interested in”, or make some similar acknowledgment of the relativity of the issue. “It’s a matter of interpretation”, we say. These two opposing views, so baldly stated, do not fairly represent any serious theorists’positions, but they do express views that are typically seen as mutually exclusive and exhaustive; the theorist must be friendly with one and only one of these themes.
我认为这是一个错误。我的论点是,虽然信仰是一种完全客观的现象(这显然使我成为一个现实主义者),但它只能从采用某种预测策略的人的角度来辨别,并且它的存在只能通过对该策略的成功进行评估来证实(这显然使我成为一个解释主义者)。
I think this is a mistake. My thesis will be that while belief is a perfectly objective phenomenon (that apparently makes me a realist), it can be discerned only from the point of view of one who adopts a certain predictive strategy, and its existence can be confirmed only by an assessment of the success of that strategy (that apparently makes me an interpretationist).
首先,我将描述一种策略,我称之为意向策略或采取意向立场。首先,意向策略包括将您想要预测其行为的对象视为具有信念、欲望和其他心理状态的理性主体,这些心理状态表现出布伦塔诺和其他人所说的意向性。这种策略以前经常被描述,但我将尝试通过展示其工作原理和效果,以新的视角来介绍这种非常熟悉的材料。
First I will describe the strategy, which I call the intentional strategy or adopting the intentional stance. To a first approximation, the intentional strategy consists of treating the object whose behavior you want to predict as a rational agent with beliefs and desires and other mental states exhibiting what Brentano and others call intentionality. The strategy has often been described before, but I shall try to put this very familiar material in a new light by showing how it works and by showing how well it works.
然后,我将论证,任何对象(或者说任何系统)的行为可以通过这种策略很好地预测,从最充分的意义上来说,它就是一个信徒。真正的信徒就是一个有意图的系统,一个其行为可以通过有意图的策略可靠且大量地预测的系统。我以前曾为这一立场辩护过(1971;1976;1978b),到目前为止,我的论点几乎没有得到皈依者,而且有很多假定的反例。我将再次努力,并处理几个令人信服的反对意见。
Then I will argue that any object—or as I shall say, any system—whose behavior is well predicted by this strategy is in the fullest sense of the word a believer. What it is to be a true believer is to be an intentional system, a system whose behavior is reliably and voluminously predictable via the intentional strategy. I have argued for this position before (1971; 1976; 1978b), and my arguments have so far garnered few converts and many presumed counterexamples. I shall try again here, harder, and shall also deal with several compelling objections.
策略有很多,有好的,有坏的。例如,这里有一个预测某人未来行为的策略:确定此人的出生日期和时间,然后将这个简单的数据输入一个或另一个占星算法,以预测此人的前景。这种策略流行得令人遗憾。它之所以流行得令人遗憾,只是因为我们有充分的理由相信它不起作用(pace Feyerabend,1978)。当占星预测成真时,这纯粹是运气好,或者是由于预言非常模糊或含糊,几乎任何可能发生的情况都可以被解释为证实它。但假设占星策略确实对某些人很有效。我们可以把这些人称为占星系统——其行为事实上可以通过占星策略预测的系统。如果有这样的人,这样的占星系统,我们就会比大多数人更关心占星策略是如何运作的——也就是说,我们会对占星术的规则、原则或方法感兴趣。我们可以通过询问占星家、阅读他们的书籍和观察他们的实际行动来了解策略是如何运作的。但我们也会好奇它为什么有效。我们可能会发现占星家对后一个问题没有任何有用的意见——他们要么没有关于它为什么有效的理论,要么他们的理论纯粹是胡说八道。有一个好的策略是一回事,知道它为什么有效又是另一回事。
There are many strategies, some good, some bad. Here is a strategy, for instance, for predicting the future behavior of a person: determine the date and hour of the person’s birth and then feed this modest datum into one or another astrological algorithm for generating predictions of the person’s prospects. This strategy is deplorably popular. Its popularity is deplorable only because we have such good reasons for believing that it does not work (pace Feyerabend, 1978). When astrological predictions come true this is sheer luck, or the result of such vagueness or ambiguity in the prophecy that almost any eventuality can be construed to confirm it. But suppose the astrological strategy did in fact work well on some people. We could call those people astrological systems—systems whose behavior was, as a matter of fact, predictable by the astrological strategy. If there were such people, such astrological systems, we would be more interested than most of us in fact are in how the astrological strategy works—that is, we would be interested in the rules, principles, or methods of astrology. We could find out how the strategy works by asking astrologers, reading their books, and observing them in action. But we would also be curious about why it worked. We might find that astrologers had no useful opinions about this latter question—they either had no theory of why it worked or their theories were pure hokum. Having a good strategy is one thing; knowing why it works is another.
然而,据我们所知,占星术系统这一类是空洞的;因此占星术策略仅作为一种社会好奇心而引起人们的兴趣。其他策略则具有更好的信誉。考虑物理策略或物理立场;如果你想预测一个系统的行为,确定它的物理构成(可能一直到微观物理层面)和对它的冲击的物理性质,并利用你对物理定律的了解来预测任何输入的结果。这是拉普拉斯预测宇宙中一切事物的整个未来的宏伟而不切实际的策略;但它有更温和、更局部、更实际可用的版本。实验室里的化学家或物理学家可以使用这种策略来预测奇异材料的行为,但同样,厨房里的厨师可以预测锅在炉子上放得太久的影响。这种策略并不总是实用的,但它在原则上总是有效的,这是物理科学的教条。 (我忽略了量子物理亚原子不确定性引起的轻微复杂情况。)
So far as we know, however, the class of astrological systems is empty; so the astrological strategy is of interest only as a social curiosity. Other strategies have better credentials. Consider the physical strategy, or physical stance; if you want to predict the behavior of a system, determine its physical constitution (perhaps all the way down to the microphysical level) and the physical nature of the impingements upon it, and use your knowledge of the laws of physics to predict the outcome for any input. This is the grand and impractical strategy of Laplace for predicting the entire future of everything in the universe; but it has more modest, local, actually usable versions. The chemist or physicist in the laboratory can use this strategy to predict the behavior of exotic materials, but equally the cook in the kitchen can predict the effect of leaving the pot on the burner too long. The strategy is not always practically available, but that it will always work in principle is a dogma of the physical sciences. (I ignore the minor complications raised by the subatomic indeterminacies of quantum physics.)
无论如何,有时更有效的做法是从物理立场转向我所说的设计立场,在设计立场中,人们忽略物体物理构造的实际(可能很混乱)细节,并假设它具有某种设计,预测它在各种情况下的行为将按照设计的方式运行。例如,大多数计算机用户根本不知道哪些物理原理决定了计算机的高度可靠性,因此是可预测的。但是,如果他们很好地了解计算机的设计目的(在许多可能的抽象层次中的任何一个层次上对其操作的描述),他们就可以非常准确和可靠地预测其行为,只有在发生物理故障时才会出现偏差。不那么引人注目的是,几乎任何人都可以根据对闹钟外观的最随意的检查来预测闹钟何时响起。人们不知道或不想知道闹钟是弹簧上弦、电池驱动、太阳能驱动、由黄铜轮和宝石轴承还是硅片制成——人们只是假设闹钟的设计使得闹钟会在设定的时间响起,闹钟会在设定的时间响起,闹钟会一直运行到设定的时间甚至更久,闹钟的设计运行或多或少是准确的,等等。为了更准确、更详细地预测闹钟的设计立场,人们必须将其设计描述到不太抽象的层次;例如,描述齿轮的层次,但未指定其材料。
Sometimes, in any event, it is more effective to switch from the physical stance to what I call the design stance, where one ignores the actual (possibly messy) details of the physical constitution of an object, and, on the assumption that it has a certain design, predicts that it will behave as it is designed to behave under various circumstances. For instance, most users of computers have not the foggiest idea what physical principles are responsible for the computer’s highly reliable, and hence predictable, behavior. But if they have a good idea of what the computer is designed to do (a description of its operation at any one of the many possible levels of abstraction), they can predict its behavior with great accuracy and reliability, subject to disconfirmation only in the cases of physical malfunction. Less dramatically, almost anyone can predict when an alarm clock will sound on the basis of the most casual inspection of its exterior. One does not know or care to know whether it is spring wound, battery driven, sunlight powered, made of brass wheels and jewel bearings or silicon chips—one just assumes that it is designed so that the alarm will sound when it is set to sound, and it is set to sound where it appears to be set to sound, and the clock will keep on running until that time and beyond, and is designed to run more or less accurately, and so forth. For more accurate and detailed design stance predictions of the alarm clock, one must descend to a less abstract level of description of its design; for instance, to the level at which gears are described, but their material is not specified.
当然,只有系统的设计行为才是可以从设计角度预测的。如果你想预测闹钟在充满液氦时的行为,请回到物理角度。不仅是人工制品,而且许多生物物体(植物和动物、肾脏和心脏、雄蕊和雌蕊)的行为方式都可以从设计角度预测。它们不仅是物理系统,而且是设计系统。
Only the designed behavior of a system is predictable from the design stance, of course. If you want to predict the behavior of an alarm clock when it is pumped full of liquid helium, revert to the physical stance. Not just artifacts but also many biological objects (plants and animals, kidneys and hearts, stamens and pistils) behave in ways that can be predicted from the design stance. They are not just physical systems but designed systems.
有时甚至设计立场在实践中也难以实现,这时人们可以采取另一种立场或策略:意向立场。其工作原理如下:首先,你决定将要预测其行为的对象视为理性主体;然后,根据主体在世界中的地位和目的,弄清楚主体应该拥有哪些信念。然后,根据同样的考虑,弄清楚主体应该拥有哪些愿望,最后,你预测这个理性主体将根据其信念采取行动,进一步实现其目标。在许多情况下(但不是所有情况下),从所选的信念和愿望集合中进行一些实际推理,将得出主体应该做什么的决定;这就是你预测主体将要做的事情。
Sometimes even the design stance is practically inaccessible, and then there is yet another stance or strategy one can adopt: the intentional stance. Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in many—but not all—instances yield a decision about what the agent ought to do; that is what you predict the agent will do.
稍加阐述,这个策略就变得清晰起来。首先考虑一下我们如何向彼此灌输信念。以下是一些真理:受保护的人往往无知;如果你让某人接触某事,他就会知道一切。一般来说,似乎,我们开始相信我们所能了解的周围世界部分的所有真相。接触x——也就是说,在适当的一段时间内与x进行感官对抗——通常是了解(或拥有真实信念)x的充分条件。正如我们所说,我们开始了解周围的一切。这种接触通常只足以获得知识,但这并不是表面上看起来的一个巨大的逃生舱;我们面对接触时接受异常无知的门槛相当高。“我不知道枪里装了子弹”,一个在场、目睹并且清醒地装弹的人说,这遭到了各种各样的彻底怀疑,只有最离奇的支持故事才能压倒一切。
The strategy becomes clearer with a little elaboration. Consider first how we go about populating each other’s heads with beliefs. A few truisms: sheltered people tend to be ignorant; if you expose someone to something he comes to know all about it. In general, it seems, we come to believe all the truths about the parts of the world around us we are put in a position to learn about. Exposure to x—that is, sensory confrontation with x over some suitable period of time—is the normally sufficient condition for knowing (or having true beliefs) about x. As we say, we come to know all about the things around us. Such exposure is only normally sufficient for knowledge, but this is not the large escape hatch it might appear; our threshold for accepting abnormal ignorance in the face of exposure is quite high. “I didn’t know the gun was loaded”, said by one who was observed to be present, sighted, and awake during the loading, meets with a variety of utter skepticism that only the most outlandish supporting tale could overwhelm.
当然,我们不会学习或记住感官历史带给我们的所有事实。尽管有“知道一切”这个短语,但我们通常知道的只是感官历史带给我们的所有相关事实。我通常不知道我所居住的房间里戴眼镜的人和穿裤子的人的比例,但如果我对此感兴趣,那就很容易学会。这不仅仅是因为我周围的一些事实低于我的辨别阈值或超出了我的记忆的整合和保持能力(例如在场所有人的身高,以英寸为单位),而且许多完全可检测、可理解、可记忆的事实对我来说毫无兴趣,因此我不会相信它们。因此,在意向策略中归因信念的一条规则是:将系统迄今为止的经验所提供的与系统利益(或愿望)相关的所有事实归因于信念。这条规则会导致过度归因——因为我们都有点健忘,即使是重要的事情。它也未能捕捉到众所周知的错误信念。但是,错误信念(任何错误信念)的归因需要特殊的谱系,而这种谱系主要存在于真实信念中。两个典型案例:S(错误地)相信p,因为S(真正地)相信琼斯告诉他p,琼斯很聪明,琼斯无意欺骗他,……等等。第二种情况:S(错误地)相信酒吧凳上有一条蛇,因为S(真正地)相信他似乎看到了酒吧凳上有一条蛇,他自己坐在离他看到的酒吧凳不到一码远的酒吧里,等等。虚假必须从某个地方开始:种子可能种在幻觉、错觉、正常的简单错误感知、记忆衰退或故意欺诈中;但所收获的错误信念却是在真实信念的培养基中生长的。
Of course we do not come to learn or remember all the truths our sensory histories avail us. In spite of the phrase “know all about”, what we come to know, normally, are only all the relevant truths our sensory histories avail us. I do not typically come to know the ratio of spectacle-wearing people to trousered people in a room I inhabit, though if this interested me, it would be readily learnable. It is not just that some facts about my environment are below my thresholds of discrimination or beyond the integration and holding power of my memory (such as the height in inches of all the people present), but that many perfectly detectable, graspable, memorable facts are of no interest to me and hence do not come to be believed by me. So one rule for attributing beliefs in the intentional strategy is this: attribute as beliefs all the truths relevant to the system’s interests (or desires) that the system’s experience to date has made available. This rule leads to attributing somewhat too much—since we all are somewhat forgetful, even of important things. It also fails to capture the false beliefs we are all known to have. But the attribution of false belief, any false belief, requires a special genealogy, which will be seen to consist in the main in true beliefs. Two paradigm cases: S believes (falsely) that p, because S believes (truly) that Jones told him that p, that Jones is pretty clever, that Jones did not intend to deceive him,…and so on. Second case: S believes (falsely) that there is a snake on the barstool, because S believes (truly) that he seems to see a snake on the barstool, is himself sitting in a bar not a yard from the barstool he sees, and so forth. The falsehood has to start somewhere: the seed may be sown in hallucination, illusion, a normal variety of simple misperception, memory deterioration, or deliberate fraud, for instance; but the false beliefs that are reaped grow in a culture medium of true beliefs.
然后,在讨论信仰归因时,人们经常关注那些神秘而复杂的信仰,无论是真还是假。天知道,它们并不是直接来自接触平凡的事物和事件,但它们的归因需要从已经归因的大量信仰中追溯出一条主要为好的论据或推理的谱系。因此,有意策略的一个含义是,真正的信徒主要相信真理。如果有人能设计出一种公认的方法来对信仰进行个体化和计数(我非常怀疑),我们就会发现,除了最小的部分(比如说,不到百分之十)之外,一个人的所有信仰都可以在我们的第一条规则下归因。1
Then there are the arcane and sophisticated beliefs, true and false, that are so often at the focus of attention in discussions of belief attribution. They do not arise directly, goodness knows, from exposure to mundane things and events, but their attribution requires tracing out a lineage of mainly good argument or reasoning from the bulk of beliefs already attributed. An implication of the intentional strategy, then, is that true believers mainly believe truths. If anyone could devise an agreed-upon method of individuating and counting beliefs (which I doubt very much), we would see that all but the smallest portion (say, less than ten percent) of a person’s beliefs were attributable under our first rule.1
请注意,此规则是派生规则,是基本规则的详细说明和进一步规范:归因于系统应当具有的那些信念。还请注意,此规则与欲望归因相互作用。我们如何归因于欲望(偏好、目标、兴趣),并以此为基础形成信念列表?我们归因于系统应当具有的欲望。这是基本规则。它规定,首先,我们要将人们所熟悉的最高或最基本的欲望列表归因于人们:生存、无痛、食物、舒适、生育、娱乐。引用其中任何一个欲望通常都会终止“为什么?”的推理游戏。人们不需要别有用心来渴望舒适或快乐或延长自己的存在。欲望归因的派生规则与信念归因相互作用。显然,我们有一条规则:将欲望归因于系统认为对它有益的那些事物。更具体一点,将系统认为是最佳手段的欲望归因于它所希望的其他目的。因此,像归因错误信念一样,归因怪异和有害的欲望需要特殊的故事。
Note that this rule is a derived rule, an elaboration and further specification of the fundamental rule: attribute those beliefs the system ought to have. Note also that the rule interacts with the attribution of desires. How do we attribute the desires (preferences, goals, interests) on whose basis we will shape the list of beliefs? We attribute the desires the system ought to have. That is the fundamental rule. It dictates, on a first pass, that we attribute the familiar list of highest, or most basic, desires to people: survival, absence of pain, food, comfort, procreation, entertainment. Citing any one of these desires typically terminates the “Why?” game of reason giving. One is not supposed to need an ulterior motive for desiring comfort or pleasure or the prolongation of one’s existence. Derived rules of desire attribution interact with belief attributions. Trivially, we have the rule: attribute desires for those things a system believes to be good for it. Somewhat more informatively, attribute desires for those things a system believes to be best means to other ends it desires. The attribution of bizarre and detrimental desires thus requires, like the attribution of false beliefs, special stories.
当我们考虑根据言语行为将欲望归因于什么时,信念和欲望之间的相互作用变得更加棘手。用语言表达欲望的能力打开了欲望归因的闸门。“我想要一个两个鸡蛋的蘑菇煎蛋卷、一些法式面包和黄油,以及半瓶略微冰镇的勃艮第白葡萄酒。”在没有这种口头声明的情况下,人们怎么能开始将欲望归因于如此具体的东西呢?事实上,一个生物如何在没有语言帮助的情况下产生如此具体的欲望?语言使我们能够表达非常具体的欲望,但它也迫使我们偶尔承诺满足比我们本来有理由努力满足的任何事物更严格的满足条件的欲望。因为为了得到你想要的东西,你经常必须说出你想要的东西,而因为你经常不能在不说出比你先前的意思更具体的东西的情况下说出你想要的东西,所以你经常最终向别人提供证据(最好的证据,你不经意间说的话)来表明你想要的东西或事态比你能满足的要具体得多——或者比已经满足你的要好得多,因为一旦你声明了,作为一个言而有信的人,你就会对满足你所声明的愿望而不是其他愿望感兴趣。
The interaction between belief and desire becomes trickier when we consider what desires we attribute on the basis of verbal behavior. The capacity to express desires in language opens the floodgates of desire attribution. “I want a two-egg mushroom omelet, some French bread and butter, and a half bottle of lightly chilled white Burgundy.” How could one begin to attribute a desire for anything so specific in the absence of such verbal declaration? How, indeed, could a creature come to contract such a specific desire without the aid of language? Language enables us to formulate highly specific desires, but it also forces us on occasion to commit ourselves to desires altogether more stringent in their conditions of satisfaction than anything we would otherwise have any reason to endeavor to satisfy. Since in order to get what you want you often have to say what you want, and since you often cannot say what you want without saying something more specific than you antecedently mean, you often end up giving others evidence (the very best of evidence, your unextorted word) that you desire things or states of affairs far more particular than would satisfy you—or better, than would have satisfied you, for once you have declared, being a man of your word, you acquire an interest in satisfying exactly the desire you declared and no other.
“我想要一些烤豆。”
“I’d like some baked beans, please.”
“是的,先生。有多少?”
“Yes sir. How many?”
您可能会反对别人要求您明确说明自己的愿望,但事实上,在日常生活中,我们都被社会化地接受类似的要求——甚至到了没有注意到的地步,当然也不会感到被压迫的地步。我之所以详细讨论这个问题,是因为它与信仰领域有相似之处,我们的语言环境永远迫使我们给出或让步于精确的言语表达,而这些信念缺乏言语化赋予它们的硬边(见 Dennett 1969,第 184-85 页;Dennett 1978b)。如果只关注这种社会力量的结果,而忽略其扭曲效应,人们很容易被误导,认为信仰和愿望显然就像储存在头脑中的句子。作为使用语言的生物,我们不可避免地会经常相信某个特定的、实际表述、拼写和标点的句子是真实的,并且在其他情况下我们希望这样的句子成真;但这些都是信仰和愿望的特殊情况,因此可能不是整个领域的可靠模型。
You might well object to having such a specification of desire demanded of you, but in fact we are all socialized to accede to similar requirements in daily life—to the point of not noticing it, and certainly not feeling oppressed by it. I dwell on this because it has a parallel in the realm of belief, where our linguistic environment is forever forcing us to give—or concede—precise verbal expression to convictions that lack the hard edges verbalization endows them with (see Dennett 1969, pp.184–85; Dennett 1978b). By concentrating on the results of this social force, while ignoring its distorting effect, one can easily be misled into thinking that it is obvious that beliefs and desires are rather like sentences stored in the head. Being language-using creatures, it is inevitable that we should often come to believe that some particular, actually formulated, spelled, and punctuated sentence is true, and that on other occasions we should come to want such a sentence to come true; but these are special cases of belief and desire and as such may not be reliable models for the whole domain.
至此,关于意向性策略中信念和欲望归因的原则就说得够多了。那么人们赋予意向系统的合理性又如何呢?人们从完美合理性的理想开始,并根据情况向下修正。也就是说,人们从这样的假设开始:人们相信他们信念的所有含义,并且不相信任何相互矛盾的信念对。这不会造成实际的混乱问题(例如无限多的含义),因为人们只关心确保所预测的系统足够合理,能够得出与其当前行为困境相关的特定含义。非理性的情况,或有限强大的推理能力,会引发特别棘手的解释问题,在此我将暂且不谈这些问题(参见 Dennett,1981;Cherniak,1986)。
That is enough, on this occasion, about the principles of belief and desire attribution to be found in the intentional strategy. What about the rationality one attributes to an intentional system? One starts with the ideal of perfect rationality and revises downward as circumstances dictate. That is, one starts with the assumption that people believe all the implications of their beliefs and believe no contradictory pairs of beliefs. This does not create a practical problem of clutter (infinitely many implications, for instance), for one is interested only in ensuring that the system one is predicting is rational enough to get to the particular implications that are relevant to its behavioral predicament of the moment. Instances of irrationality, or of finitely powerful capacities of inferences, raise particularly knotty problems of interpretation, which I will set aside on this occasion (see Dennett, 1981; Cherniak, 1986).
因为我想从对策略的描述转向对其使用的问题。人们真的会使用这种策略吗?是的,一直都在使用。也许有一天会存在其他归因信念和愿望以及预测行为的策略,但这是目前我们唯一知道的策略。它什么时候起作用?它几乎一直对人们有效。为什么允许牛津大学的各个学院在他们认为合适的时间创建和授予学位不是一个好主意?答案是一个很长的故事,但很容易得出。而且人们会就主要观点达成广泛共识。我们很容易想到人们会采取这样的行为,以给别人提供这样做的理由,从而给别人提供这样做的理由……创造我们不希望出现的情况。我们对有意策略的使用是如此习惯和毫不费力,以至于它在塑造我们对人的期望方面所起的作用很容易被忽视。这种策略在大多数情况下也适用于大多数其他哺乳动物。例如,你可以用它来设计更好的陷阱来捕捉哺乳动物,通过推理动物对各种事物的了解或信念、它的喜好和想要避免的东西。这一策略适用于鸟类、鱼类、爬行动物、昆虫和蜘蛛,甚至适用于像蛤蜊这样低等而缺乏进取心的生物(一旦蛤蜊认为有危险,它就不会放松紧握的壳,直到确信危险已经过去)。它也适用于一些人工制品:下棋的计算机不会吃掉你的骑士,因为它知道接下来的走法会导致失去车,它不希望这种情况发生。更温和地说,恒温器会在相信房间已达到所需温度时立即关闭锅炉。
For I want to turn from the description of the strategy to the question of its use. Do people actually use this strategy? Yes, all the time. There may someday be other strategies for attributing belief and desire and for predicting behavior, but this is the only one we all know now. And when does it work? It works with people almost all the time. Why would it not be a good idea to allow individual Oxford colleges to create and grant academic degrees whenever they saw fit? The answer is a long story, but very easy to generate. And there would be widespread agreement about the major points. We have no difficulty thinking of the reasons people would then have for acting in such ways as to give others reasons for acting in such ways as to give others reasons for…creating a circumstance we would not want. Our use of the intentional strategy is so habitual and effortless that the role it plays in shaping our expectations about people is easily overlooked. The strategy also works on most other mammals most of the time. For instance, you can use it to design better traps to catch those mammals, by reasoning about what the creature knows or believes about various things, what it prefers, what it wants to avoid. The strategy works on birds, and on fish, and on reptiles, and on insects and spiders, and even on such lowly and unenterprising creatures as clams (once a clam believes there is danger about, it will not relax its grip on its closed shell until it is convinced that the danger has passed). It also works on some artifacts: the chess-playing computer will not take your knight because it knows that there is a line of ensuing play that would lead to losing its rook, and it does not want that to happen. More modestly, the thermostat will turn off the boiler as soon as it comes to believe the room has reached the desired temperature.
这一策略甚至适用于植物。在春末风暴频发的地区,你应该种植那些对春天到来特别谨慎的苹果品种——当然,春天正是它们开花的时候。这一策略甚至适用于闪电等无生命且明显非人为的现象。一位电工曾向我解释他是如何保护我的地下水泵免受雷击损害的:他说,闪电总是想找到最佳接地方式,但有时它会被诱骗走次优路径。你可以通过让另一条更好的路径更容易被闪电发现来保护水泵。
The strategy even works for plants. In a locale with late spring storms, you should plant apple varieties that are particularly cautious about concluding that it is spring—which is when they want to blossom, of course. It even works for such inanimate and apparently undesigned phenomena as lightning. An electrician once explained to me how he worked out how to protect my underground water pump from lightning damage: lightning, he said, always wants to find the best way to ground, but sometimes it gets tricked into taking second-best paths. You can protect the pump by making another, better path more obvious to the lightning.
现在,很明显,这是“严肃的”信念归因、可疑的信念归因、教学上有用的隐喻、说话方式,甚至更糟的彻头彻尾的欺诈行为的杂乱无章的集合。下一个任务似乎是区分那些真正具有信念和欲望的意向系统和那些我们可能觉得方便对待的、好像它们具有信念和欲望的系统。但那将是一项西西弗斯式的劳动,否则就会被命令终止。更好地理解信念现象始于这样的观察:即使在最糟糕的情况下,即使我们最确信该策略出于错误的原因而起作用,它仍然确实起作用,至少有一点。这是一个有趣的事实,它将这类对象(意向系统类)与该策略永远不起作用的对象类区分开来。但事实如此吗?我们对意向系统的定义是否排除了任何对象?例如,似乎可以将这个教室里的讲台解释为一个有意识的系统,完全理性,相信它目前位于文明世界的中心(你们中的一些人可能也这么认为),并且最希望的是留在那个中心。这样一个拥有信仰和欲望的理性主体应该做什么?显然,留在原地——这正是讲台所做的。我从有意识的立场准确地预测了讲台的行为,那么它是一个有意识的系统吗?如果是,那么任何事情都是有意识的。
Now clearly this is a motley assortment of “serious” belief attributions, dubious belief attributions, pedagogically useful metaphors, façons de parler, and, perhaps worse, outright frauds. The next task would seem to be distinguishing those intentional systems that really have beliefs and desires from those we may find it handy to treat as if they had beliefs and desires. But that would be a Sisyphean labor, or else would be terminated by fiat. A better understanding of the phenomenon of belief begins with the observation that even in the worst of these cases, even when we are surest that the strategy works for the wrong reasons, it is nevertheless true that it does work, at least a little bit. This is an interesting fact, which distinguishes this class of objects, the class of intentional systems, from the class of objects for which the strategy never works. But is this so? Does our definition of an intentional system exclude any objects at all? For instance, it seems the lectern in this lecture room can be construed as an intentional system, fully rational, believing that it is currently located at the center of the civilized world (as some of you may also think), and desiring above all else to remain at that center. What should such a rational agent so equipped with belief and desire do? Stay put, clearly—which is just what the lectern does. I predict the lectern’s behavior, accurately, from the intentional stance, so is it an intentional system? If it is, anything at all is.
什么应该取消讲台的资格?首先,这种策略在这种情况下不适用,因为我们没有从中获得我们之前没有的预测能力。我们已经知道讲台要做什么——也就是什么也不做——并且以一种非常不合原则的方式调整信仰和愿望以适应它。然而,在人、动物或计算机的情况下,情况就不同了。在这些情况下,通常唯一可行的策略是有意策略;它赋予我们其他方法无法获得的预测能力。但是,有人会强调,这并不是本质上的差异,而只是反映了我们作为科学家的有限能力的差异。拉普拉斯全知物理学家可以预测计算机的行为——或者假设活体人体最终受物理定律支配——而无需采用设计或有意策略的冒险捷径方法。对于机械能力有限的人来说,对简单恒温器的有意解释是一种方便且基本无害的拐杖,但我们当中的工程师无需借助这种拟人化就可以完全掌握其内部运作。也许最聪明的工程师几乎不可能对更复杂的系统(如分时计算机系统或遥控太空探测器)保持清晰的概念,而不会陷入意向立场(将这些设备视为询问和告知、尝试和避免、想要和相信),但这只是人类认知弱点的更高级例子。我们不想基于如此多变和狭隘的理由将这些人工制品与真正的信徒(我们自己)归为一类,不是吗?从一位观察者的角度来看,某些人工制品、生物或人是信徒,但从另一位更聪明的观察者的角度来看,根本不是信徒,这难道不是无法容忍的吗?这将是解释主义的一个特别激进的版本,有些人认为我支持这种观点,因为我敦促从意向策略的成功的角度来看待这种信念。我必须承认,我提出的观点有时会引发这种解读,但现在我想阻止这种解读。是否采取意向立场的决定是自由的,但是,如果采取了该立场,其成功或失败的事实是完全客观的。
What should disqualify the lectern? For one thing, the strategy does not recommend itself in this case, for we get no predictive power from it that we did not antecedently have. We already knew what the lectern was going to do—namely nothing—and tailored the beliefs and desires to fit in a quite unprincipled way. In the case of people or animals or computers, however, the situation is different. In these cases often the only strategy that is at all practical is the intentional strategy; it gives us predictive power we can get by no other method. But, it will be urged, this is no difference in nature, but merely a difference that reflects upon our limited capacities as scientists. The Laplacean omniscient physicist could predict the behavior of a computer—or of a live human body, assuming it to be ultimately governed by the laws of physics—without any need for the risky, short-cut methods of either the design or intentional strategies. For people of limited mechanical aptitude, the intentional interpretation of a simple thermostat is a handy and largely innocuous crutch, but the engineers among us can quite fully grasp its internal operation without the aid of this anthropomorphizing. It may be true that the cleverest engineers find it practically impossible to maintain a clear conception of more complex systems, such as a time-sharing computer system or remote-controlled space probe, without lapsing into an intentional stance (and viewing these devices as asking and telling, trying and avoiding, wanting and believing), but this is just a more advanced case of human epistemic frailty. We would not want to classify these artifacts with the true believers—ourselves—on such variable and parochial grounds, would we? Would it not be intolerable to hold that some artifact or creature or person was a believer from the point of view of one observer, but not a believer at all from the point of view of another, cleverer observer? That would be a particularly radical version of interpretationism, and some have thought I espoused it in urging that belief be viewed in terms of the success of the intentional strategy. I must confess that my presentation of the view has sometimes invited that reading, but I now want to discourage it. The decision to adopt the intentional stance is free, but the facts about the success or failure of the stance, were one to adopt it, are perfectly objective.
一旦实施了有意策略,它就是一种非常强大的预测工具——这一事实在很大程度上被我们通常关注的、它产生可疑或不可靠结果的情况所掩盖。例如,考虑预测国际象棋比赛中的走法。人们可以看到,国际象棋之所以有趣,是因为对手的走法不可预测,除非在那些“被迫”走法的情况下——显然只有一种最佳走法——通常是所有可行走法中危害最小的一种。但是,当人们认识到在典型的国际象棋情况下,有很多完全合法且可用的走法,但只有少数——也许只有六种——有可取之处,因此根据有意策略,只有少数几步是高概率走法时,这种不可预测性就被放在了背景中。即使有意策略无法区分出概率最高的单一走法,它也会大大减少可行的选择数量。
Once the intentional strategy is in place, it is an extraordinarily powerful tool in prediction—a fact that is largely concealed by our typical concentration on the cases in which it yields dubious or unreliable results. Consider, for instance, predicting moves in a chess game. What makes chess an interesting game, one can see, is the unpredictability of one’s opponent’s moves, except in those cases where moves are “forced”—where there is clearly one best move—typically the least of the available evils. But this unpredictability is put in context when one recognizes that in the typical chess situation there are very many perfectly legal and hence available moves, but only a few—perhaps half a dozen—with anything to be said for them, and hence only a few high-probability moves according to the intentional strategy. Even when the intentional strategy fails to distinguish a single move with a highest probability, it can dramatically reduce the number of live options.
意图策略的相同特征在应用于“现实世界”案例时显而易见。众所周知,它无法预测股票交易员的确切买入和卖出决定,或者政客在预定演讲中会说出的确切词序。但是,对于稍微不那么具体的预测,人们的信心确实非常高:例如,特定交易员今天不会购买公用事业,或者政客会站在工会一边反对他的政党。从另一个角度来看,这种无法预测行为细粒度描述的能力是意图策略的优势之源,因为正是这种对实施细节的中立性,才允许人们在复杂情况下利用意图策略,例如,在连锁预测中(见 Dennett,1978a)。假设美国国务卿宣布他是克格勃的付费特工。这是多么史无前例的事件!其后果又是多么难以预测!但事实上,我们可以预测几十个虽然不是很有趣但却非常突出的后果,以及后果的后果。总统将与内阁其他成员商议,内阁将支持他决定在各种调查(精神病学和政治)结果出来之前解除国务卿的职务,所有这些都将在新闻发布会上报告给那些撰写报道的人,这些报道将在社论中被评论,而社论又会被那些给编辑写信的人阅读,等等。这些都不是大胆的预言,但请注意,它描述了时空中的因果关系弧,任何可以想象的物理学或生物学的实际扩展都无法预测这种因果关系弧。
The same feature of the intentional strategy is apparent when it is applied to “real world” cases. It is notoriously unable to predict the exact purchase and sell decisions of stock traders, for instance, or the exact sequence of words a politician will utter when making a scheduled speech. But one’s confidence can be very high indeed about slightly less specific predictions: that the particular trader will not buy utilities today, or that the politician will side with the unions against his party, for example. This inability to predict fine-grained descriptions of actions, looked at another way, is a source of strength for the intentional strategy, for it is this neutrality with regard to details of implementation that permits one to exploit the intentional strategy in complex cases, for instance, in chaining predictions (see Dennett, 1978a). Suppose the US Secretary of State were to announce he was a paid agent of the KGB. What an unparalleled event! How unpredictable its consequences! Yet in fact we can predict dozens of not terribly interesting but perfectly salient consequences, and consequences of consequences. The President would confer with the rest of the Cabinet, which would support his decision to relieve the Secretary of State of his duties pending the results of various investigations, psychiatric and political, and all this would be reported at a news conference to people who would write stories that would be commented upon in editorials that would be read by people who would write letters to the editors, and so forth. None of that is daring prognostication, but note that it describes an arc of causation in space-time that could not be predicted under any description by any imaginable practical extension of physics or biology.
几年前,罗伯特·诺齐克首次提出了一个反对意见,这让我们更清楚地看到了意向性策略的力量。他提出,假设一些拥有超凡智慧的生物(比如来自火星的生物)降临到我们身边,假设我们之于他们就像简单的恒温器之于聪明的工程师。也就是说,假设他们不需要意向性立场,甚至不需要设计立场,就能预测我们的行为的所有细节。我们可以假设他们是拉普拉斯超级物理学家,能够在微观物理层面理解华尔街的活动。我们看到的是经纪人和建筑物以及卖单和买单,而他们看到的是大量亚原子粒子在四处乱窜——他们是如此优秀的物理学家,以至于他们可以提前几天预测每天在标有“道琼斯工业平均指数收盘价”的纸带上会出现什么墨迹。他们可以预测他们观察到的所有各种运动物体的个体行为,而无需将它们中的任何一个视为有意识的系统。那么,我们可以说,从他们的角度来看,我们实际上根本不是信徒(就像一个简单的恒温器一样)吗?如果是这样,那么我们作为信徒的身份就不是客观的,而是旁观者眼中的事物——前提是旁观者与我们一样智力有限。
The power of the intentional strategy can be seen even more sharply with the aid of an objection first raised by Robert Nozick some years ago. Suppose, he suggested, some beings of vastly superior intelligence—from Mars, let us say—were to descend upon us, and suppose that we were to them as simple thermostats are to clever engineers. Suppose, that is, that they did not need the intentional stance—or even the design stance—to predict our behavior in all its detail. They can be supposed to be Laplacean super-physicists, capable of comprehending the activity on Wall Street, for instance, at the microphysical level. Where we see brokers and buildings and sell orders and bids, they see vast congeries of subatomic particles milling about—and they are such good physicists that they can predict days in advance what ink marks will appear each day on the paper tape labeled “Closing Dow Jones Industrial Average”. They can predict the individual behaviors of all the various moving bodies they observe without ever treating any of them as intentional systems. Would we be right then to say that from their point of view we really were not believers at all (any more than a simple thermostat is)? If so, then our status as believers is nothing objective, but rather something in the eye of the beholder—provided the beholder shares our intellectual limitations.
我们想象中的火星人也许能够通过拉普拉斯方法预测人类的未来,但如果他们不把我们看作有意识的系统,他们就会错过一些完全客观的东西:人类行为的模式,这些模式可以从有意识的立场描述,并且只能从这个立场描述,并且支持概括和预测。举一个具体的例子,火星人观察一位股票经纪人决定订购 500 股通用汽车股票。他们预测了他拨打电话时手指的确切动作,以及他发出订单时声带的确切振动。但是,如果火星人没有看到,无数不同的手指运动和声带振动模式——甚至无数不同个体的运动——可以取代实际的细节,而不会扰乱随后的市场运作,那么他们就没有看到他们所观察的世界中的真实模式。就像火花塞有无数种制作方法一样(除非人们意识到可以将各种不同的设备拧入这些插座而不会影响发动机的性能,否则人们就无法理解内燃机是什么),同样,订购 500 股通用汽车股票的方法也有很多,而且在社会插座中,其中一种方式产生的效果与其他方式几乎相同。社会枢纽也存在,人们选择哪条路取决于他们是否相信 p或渴望 A,而不取决于它们可能相似或不同的其他无限多种方式。
Our imagined Martians might be able to predict the future of the human race by Laplacean methods, but if they did not also see us as intentional systems, they would be missing something perfectly objective: the patterns in human behavior that are describable from the intentional stance, and only from that stance, and that support generalizations and predictions. Take a particular instance in which the Martians observe a stockbroker deciding to place an order for 500 shares of General Motors. They predict the exact motions of his fingers as he dials the phone and the exact vibrations of his vocal cords as he intones his order. But if the Martians do not see that indefinitely many different patterns of finger motions and vocal cord vibrations—even the motions of indefinitely many different individuals—could have been substituted for the actual particulars without perturbing the subsequent operation of the market, then they have failed to see a real pattern in the world they are observing. Just as there are indefinitely many ways of being a spark plug—and one has not understood what an internal combustion engine is unless one realizes that a variety of different devices can be screwed into these sockets without affecting the performance of the engine—so there are indefinitely many ways of ordering 500 shares of General Motors, and there are societal sockets in which one of these ways will produce just about the same effect as any other. There are also societal pivot points, as it were, where which way people go depends on whether they believe that p, or desire A, and does not depend on any of the other infinitely many ways they may be alike or different.
假设,让我们再进一步探讨一下火星人的幻想,一个火星人和一个地球人进行一场预测竞赛。地球人和火星人观察(并观察彼此观察)当地物理交换的某个部分。从地球人的角度来看,观察到的情况如下。加德纳太太的厨房里响起了电话铃声。她接了电话,说了这样的话:“哦,你好,亲爱的。你早点回家?一个小时内?带老板来吃饭?那么回家的路上买瓶酒,小心开车。”根据这一观察,我们的地球人预测,一辆大型金属橡胶轮胎车将在一小时内停在车道上,车里会有两个人,其中一个拿着一个纸袋,里面装着一瓶酒精。这个预测可能有点冒险,但从各方面来看都是一个不错的选择。火星人也做出了同样的预测,但他必须利用更多关于大量相互作用的信息,而据他所知,地球人对此一无所知。例如,在距离房屋五英里的A路口,车辆减速,否则就会与另一辆车相撞——火星人已经费尽心思计算了几百米的碰撞路线。地球人的表现看起来就像魔术一样!地球人怎么知道下车去商店拿瓶子的人会回来?在火星人绘制的路径经历了所有的变幻莫测、交叉和分支之后,地球人的预言得以实现,在任何人看来,如果没有故意的策略,这似乎就像萨迈拉之约的宿命论必然性一样奇妙和难以解释。宿命论者(例如占星家)相信,人类事务中存在着一种不可抗拒的模式,无论发生什么,这种模式都会自行发挥作用,也就是说,无论受害者如何策划和猜测,无论他们如何在锁链中扭来扭去。这些宿命论者错了,但他们几乎是对的。人类事务中存在着一种模式,这种模式并非完全不可抗拒,但却非常有力,吸收了可能被认为是随机的物理扰动和变化;这些是我们用理性主体的信念、欲望和意图来描述的模式。
Suppose, pursuing our Martian fantasy a little further, that one of the Martians were to engage in a predicting contest with an Earthling. The Earthling and the Martian observe (and observe each other observing) a particular bit of local physical transaction. From the Earthling’s point of view, this is what is observed. The telephone rings in Mrs. Gardner’s kitchen. She answers, and this is what she says: “Oh, hello dear. You’re coming home early? Within the hour? And bringing the boss to dinner? Pick up a bottle of wine on the way home then, and drive carefully.” On the basis of this observation, our Earthling predicts that a large metallic vehicle with rubber tires will come to a stop on the drive within one hour, disgorging two human beings, one of whom will be holding a paper bag containing a bottle containing an alcoholic fluid. The prediction is a bit risky, perhaps, but a good bet on all counts. The Martian makes the same prediction, but has to avail himself of much more information about an extraordinary number of interactions of which, so far as he can tell, the Earthling is entirely ignorant. For instance, the deceleration of the vehicle at intersection A, five miles from the house, without which there would have been a collision with another vehicle—whose collision course had been laboriously calculated over some hundreds of meters by the Martian. The Earthling’s performance would look like magic! How did the Earthling know that the human being who got out of the car and got the bottle in the shop would get back in? The coming true of the Earthling’s prediction, after all the vagaries, intersections, and branches in the paths charted by the Martian, would seem to anyone bereft of the intentional strategy as marvelous and inexplicable as the fatalistic inevitability of the appointment in Samarra. Fatalists—for instance, astrologers—believe that there is a pattern in human affairs that is inexorable, that will impose itself come what may, that is, no matter how the victims scheme and second-guess, no matter how they twist and turn in their chains. These fatalists are wrong, but they are almost right. There are patterns in human affairs that impose themselves, not quite inexorably but with great vigor, absorbing physical perturbations and variations that might as well be considered random; these are the patterns that we characterize in terms of the beliefs, desires, and intentions of rational agents.
毫无疑问,您已经注意到了我们思想实验中的一个严重缺陷,并因此而分心:火星人被认为将他的地球对手视为与他一样聪明的生物,可以与他交流,可以与他打赌,可以与他竞争。简而言之,火星人是一个有信念(比如他在预测中表达的信念)和欲望(比如赢得预测比赛的欲望)的生物。因此,如果火星人在一个地球人身上看到了这种模式,他怎么可能看不到其他地球人身上的模式呢?作为一点叙述,我们的例子可以通过假设我们的地球人巧妙地学习了火星语(通过 X 射线调制传播)并伪装成火星人来加强,依靠这些原本聪明的外星人的物种沙文主义,让他成为一个有意图的系统,而不会泄露他的同胞人类的秘密。这一补充可能会让我们摆脱故事中的糟糕转折,但可能会掩盖要得出的寓意:即,对自己和同类智能生物采取意向立场的不可避免性。这种不可避免性本身是与利益相关的;例如,完全有可能对包括自己在内的智能生物采取物理立场,但并不排除同时对自己和同类保持意向立场,例如,如果一个人打算学习他们所知道的东西(斯图尔特·汉普郡在许多著作中都有力地阐述了这一点)。我们或许可以假设我们的超级智能火星人未能将我们识别为意向系统,但我们不能假设他们缺乏必要的概念。2如果他们观察、理论化、预测、交流,他们就会将自己视为意向系统。3有智能生物的地方,就必须有模式可供描述,无论我们是否愿意看到它们。
No doubt you will have noticed, and been distracted by, a serious flaw in our thought experiment: the Martian is presumed to treat his Earthling opponent as an intelligent being like himself, with whom communication is possible, a being with whom one can make a wager, against whom one can compete. In short, a being with beliefs (such as the belief he expressed in his prediction) and desires (such as the desire to win the prediction contest). So if the Martian sees the pattern in one Earthling, how can he fail to see it in the others? As a bit of narrative, our example could be strengthened by supposing that our Earthling cleverly learned Martian (which is transmitted by X-ray modulation) and disguised himself as a Martian, counting on the species-chauvinism of these otherwise brilliant aliens to permit him to pass as an intentional system while not giving away the secret of his fellow human beings. This addition might get us over a bad twist in the tale, but might obscure the moral to be drawn: namely, the unavoidability of the intentional stance with regard to oneself and one’s fellow intelligent beings. This unavoidability is itself interest relative; it is perfectly possible to adopt a physical stance, for instance, with regard to an intelligent being, oneself included, but not to the exclusion of maintaining at the same time an intentional stance with regard to oneself at a minimum, and one’s fellows if one intends, for instance, to learn what they know (a point that has been powerfully made by Stuart Hampshire in a number of writings). We can perhaps suppose our super-intelligent Martians fail to recognize us as intentional systems, but we cannot suppose them to lack the requisite concepts.2 If they observe, theorize, predict, communicate, they view themselves as intentional systems.3 Where there are intelligent beings, the patterns must be there to be described, whether or not we care to see them.
认识到智能生物活动中可辨别的有意模式的客观现实很重要,但认识到模式中的不完整性和不完善性也很重要。客观事实是,有意策略虽然行之有效,但并不完美。没有人是完全理性、完全不会忘记、无所不知或不会疲劳、不会发生故障或设计缺陷的。这不可避免地会导致超出有意策略描述能力的情况,就像电话或汽车等人工制品受到物理损坏时,可能无法用该人工制品的正常设计术语来描述一样。如何绘制部分熔化的音频放大器的原理图接线图,或者如何描述故障计算机的程序状态?即使是最轻微和最常见的认知病理学情况——例如,人们似乎持有矛盾的信念,或者在自欺欺人——对意向策略的解释准则也无法对将哪些信念和愿望归因于一个人得出明确、稳定的结论。
It is important to recognize the objective reality of the intentional patterns discernible in the activities of intelligent creatures, but also important to recognize the incompleteness and imperfections in the patterns. The objective fact is that the intentional strategy works as well as it does, which is not perfectly. No one is perfectly rational, perfectly unforgetful, all-observant, or invulnerable to fatigue, malfunction, or design imperfection. This leads inevitably to circumstances beyond the power of the intentional strategy to describe, in much the same way that physical damage to an artifact, such as a telephone or an automobile, may render it indescribable by the normal design terminology for that artifact. How do you draw the schematic wiring diagram of an audio amplifier that has been partially melted, or how do you characterize the program state of a malfunctioning computer? In cases of even the mildest and most familiar cognitive pathology—where people seem to hold contradictory beliefs or to be deceiving themselves, for instance—the canons of interpretation of the intentional strategy fail to yield clear, stable verdicts about which beliefs and desires to attribute to a person.
现在,一个对信仰和欲望持强烈现实主义立场的人会声称,在这些情况下,所讨论的人确实有一些特定的信仰和欲望,而正如我所描述的,意向策略根本无法预测这些信仰和欲望。在我所提倡的较为温和的现实主义中,没有事实表明一个人在这些退化的情况下究竟有哪些信仰和欲望,但这并不是向相对主义或主观主义投降,因为何时以及为何没有事实本身就是一个客观事实问题。根据这种观点,人们甚至可以承认信仰归因的兴趣相对性,并承认鉴于不同文化的不同兴趣,例如,一种文化赋予某个成员的信仰和欲望可能与另一种文化赋予同一个人的信仰和欲望大不相同。但假设在特定情况下确实如此,那么还需要进一步的事实来说明每种相互竞争的意向策略在预测该人的行为方面效果如何。我们可以提前确定,对个人的任何有意解释都不会完美无缺,而且可能存在两种竞争方案,它们同样好,甚至比我们能想出的任何其他方案都好。这种情况本身就是事实。一种模式的客观存在(无论其缺陷如何)并不排除另一种模式的客观存在(无论其缺陷如何)。
Now a strong realist position on beliefs and desires would claim that in these cases the person in question really does have some particular beliefs and desires which the intentional strategy, as I have described it, is simply unable to divine. On the milder sort of realism I am advocating, there is no fact of the matter of exactly which beliefs and desires a person has in these degenerate cases, but this is not a surrender to relativism or subjectivism, for when and why there is no fact of the matter is itself a matter of objective fact. On this view one can even acknowledge the interest relativity of belief attributions and grant that given the different interests of different cultures, for instance, the beliefs and desires one culture would attribute to a member might be quite different from the beliefs and desires another culture would attribute to the very same person. But supposing that were so in a particular case, there would be the further facts about how well each of the rival intentional strategies worked for predicting the behavior of that person. We can be sure in advance that no intentional interpretation of an individual will work to perfection, and it may be that two rival schemes are about equally good, and better than any others we can devise. That this is the case is itself something about which there can be a fact of the matter. The objective presence of one pattern (with whatever imperfections) does not rule out the objective presence of another pattern (with whatever imperfections).
与意向性策略具有同等依据的截然不同的解释的隐患在理论上是重要的——或许可以说在形而上学上是重要的——但在实践中,一旦人们将注意力集中在我们所知的最大、最复杂的意向性系统:人类身上,这种隐患就可以忽略不计了。4
The bogey of radically different interpretations with equal warrant from the intentional strategy is theoretically important—one might better say metaphysically important—but practically negligible once one restricts one’s attention to the largest and most complex intentional systems we know: human beings.4
到目前为止,我一直在强调我们与蛤蜊和恒温器的亲缘关系,以强调信念归因的逻辑地位,但现在是时候承认明显的差异并说明如何利用它们了。错误的说法仍然存在:成为一个真正的信徒只需要一个系统,其行为可以通过意向策略可靠地预测,因此真正相信p(对于任何命题 p)只需要一个意向系统,对于这个系统,p在最佳(最具预测性)解释中作为信念出现。但是,一旦我们把注意力转向真正有趣和多功能的意向系统,我们就会发现,这种看似肤浅和工具主义的信念标准对真正的信徒的内部构成施加了严格的限制,从而最终产生了一种强有力的信念。
Until now I have been stressing our kinship to clams and thermostats, in order to emphasize a view of the logical status of belief attribution, but the time has come to acknowledge the obvious differences and say what can be made of them. The perverse claim remains: all there is to being a true believer is being a system whose behavior is reliably predictable via the intentional strategy, and hence all there is to really and truly believing that p (for any proposition p) is being an intentional system for which p occurs as a belief in the best (most predictive) interpretation. But once we turn out attention to the truly interesting and versatile intentional systems, we see that this apparently shallow and instrumentalistic criterion of belief puts a severe constraint on the internal constitution of a genuine believer, and thus yields a robust version of belief after all.
以低级恒温器为例,它是可以想象到的能引起我们注意超过片刻的退化的意向系统案例。顺着这个笑话,我们或许同意赋予它大约六种不同信念和更少欲望的能力——它可以相信房间太冷或太热,锅炉开着或关着,如果它想让房间暖和点,它应该打开锅炉,等等。但这肯定是把太多的东西归咎于恒温器;例如,它没有热量或锅炉的概念。所以假设我们反解释它的信念和欲望:它可以相信 A 太 F 或 G,如果它想让 A 更 F,它应该做 K,等等。毕竟,通过将恒温控制机制连接到不同的输入和输出设备,它可以调节水箱中的水量,或者火车的速度,等等。它与热敏传感器和锅炉的连接过于贫乏,无法赋予其类似信仰的状态任何丰富的语义。
Consider the lowly thermostat, as degenerate a case of intentional system as could conceivably hold our attention for more than a moment. Going along with the gag, we might agree to grant it the capacity for about half a dozen different beliefs and fewer desires—it can believe the room is too cold or too hot, that the boiler is on or off, and that if it wants the room warmer it should turn on the boiler, and so forth. But surely this is imputing too much to the thermostat; it has no concept of heat or of a boiler, for instance. So suppose we de-interpret its beliefs and desires: it can believe the A is too F or G, and if it wants the A to be more F it should do K, and so forth. After all, by attaching the thermostatic control mechanism to different input and output devices, it could be made to regulate the amount of water in a tank, or the speed of a train, for instance. Its attachment to a heat sensitive transducer and a boiler is too impoverished a link to the world to grant any rich semantics to its belief-like states.
但是,假设我们随后丰富了这些依恋模式。假设我们给它提供多种了解温度的方法,例如。我们给它一只眼睛,可以区分房间里蜷缩着、瑟瑟发抖的人,还有一只耳朵,这样它就可以知道天气有多冷。我们给它一些地理知识,这样如果它知道它的时空位置是 12 月的温尼伯,它就可以得出结论,它可能在一个寒冷的地方。当然,要给它一个多用途、通用的视觉系统——而不仅仅是一个颤抖物体探测器——将需要其内部结构的巨大复杂性。假设我们还为系统提供了更多的行为多功能性:它选择锅炉燃料、从最便宜、最可靠的经销商处购买、检查密封条等等。这增加了内部复杂性的另一个维度;它为各个信念状态提供了更多的事情可做,实际上是通过提供更多不同的机会来从其他状态推导或推断它们,并通过提供更多不同的机会让它们作为进一步推理的前提。丰富设备与其所在世界之间的这些连接的累积效应是丰富其虚拟谓词 F 和 G 等的语义。我们添加的这些越多,我们的设备就越不适合用作除室温维持系统以外的任何系统的控制结构。更正式的说法是,随着我们增加这些复杂性,体现在形式系统内部状态中的难以区分的令人满意的模型类别变得越来越小;我们添加的越多,系统的语义就越丰富、要求就越高或越具体,直到最终我们达到在实践上(但从未在原则上)规定了唯一语义解释的系统(参见 Hayes,1979)。此时,我们说这个设备(或动物或人)对热量和这个房间等有信念,这不仅是因为系统在世界上的实际位置和操作,还因为我们无法想象它可以被放置在另一个可以工作的环境中(另见 Dennett,1982,1987a)。
But suppose we then enrich these modes of attachment. Suppose we give it more than one way of learning about the temperature, for instance. We give it an eye of sorts that can distinguish huddled, shivering occupants of the room and an ear so that it can be told how cold it is. We give it some facts about geography so that it can conclude that is probably in a cold place if it learns that its spatio-temporal location is Winnipeg in December. Of course giving it a visual system that is multipurpose and general—not a mere shivering-object detector—will require vast complications of its inner structure. Suppose we also give our system more behavioral versatility: it chooses the boiler fuel, purchases it from the cheapest and most reliable dealer, checks the weather stripping, and so forth. This adds another dimension of internal complexity; it gives individual belief-like states more to do, in effect, by providing more and different occasions for their derivation or deduction from other states, and by providing more and different occasions for them to serve as premises for further reasoning. The cumulative effect of enriching these connections between the device and the world in which it resides is to enrich the semantics of its dummy predicates, F and G and the rest. The more of this we add, the less amenable our device becomes to serving as the control structure of anything other than a room-temperature maintenance system. A more formal way of saying this is that the class of indistinguishably satisfactory models of the formal system embodied in its internal states gets smaller and smaller as we add such complexities; the more we add, the richer or more demanding or specific the semantics of the system, until eventually we reach systems for which a unique semantic interpretation is practically (but never in principle) dictated (see Hayes, 1979). At that point we say this device (or animal or person) has beliefs about heat and about this very room, and so forth, not only because of the system’s actual location in, and operations on, the world, but because we cannot imagine another niche in which it could be placed where it would work (see also Dennett, 1982, 1987a).
我们最初的简单恒温器有一种状态,我们称之为对特定锅炉的信念,即它是开着还是关着。为什么是那个锅炉呢?那么,你想说它是关于哪个锅炉的呢?信念是关于锅炉的,因为它是固定在锅炉上的。5考虑到与世界实际存在的因果联系,即使是最小的因果联系,我们可以赋予设备的状态(某种意义上的意义)和真值条件,但用不同的最小联系来代替并完全改变内部状态的意义(在这种贫乏的意义上)是太容易了。但随着系统在感知上变得更加丰富,在行为上变得更加灵活,在不改变系统本身的组织的情况下,对系统与世界的实际联系进行替换变得越来越困难。如果你改变它的环境,它实际上会注意到,并作为回应改变它的内部状态。设备和环境之间出现了一种双向约束,即特异性越来越强。将设备固定在任何一种状态,它都需要一个非常特定的环境才能正常运行(您无法再轻松地将其从调节温度切换到调节速度或其他任何状态);但同时,如果您不固定它所处的状态,而是将它放在一个变化的环境中,它的感官附件将足够敏感和有辨别力,能够对变化做出适当的反应,使系统进入新状态,在新环境中有效运行。有一种熟悉的方式来暗示系统组织与其环境之间存在的这种紧密关系:你可以说有机体不断地反映环境,或者说系统组织中存在环境的表征(或隐含在环境的表征中)。
Our original simple thermostat had a state we called a belief about a particular boiler, to the effect that it was on or off. Why about that boiler? Well, what other boiler would you want to say it was about? The belief is about the boiler because it is fastened to the boiler.5 Given the actual, if minimal, causal link to the world that happened to be in effect, we could endow a state of the device with meaning (of a sort) and truth conditions, but it was altogether too easy to substitute a different minimal link and completely change the meaning (in this impoverished sense) of that internal state. But as systems become perceptually richer and behaviorally more versatile, it becomes harder and harder to make substitutions in the actual links of the system to the world without changing the organization of the system itself. If you change its environment, it will notice, in effect, and make a change in its internal state in response. There comes to be a two-way constraint of growing specificity between the device and the environment. Fix the device in any one state and it demands a very specific environment in which to operate properly (you can no longer switch it easily from regulating temperature to regulating speed or anything else); but at the same time, if you do not fix the state it is in, but just plunk it down in a changed environment, its sensory attachments will be sensitive and discriminative enough to respond appropriately to the change, driving the system into a new state, in which it will operate effectively in the new environment. There is a familiar way of alluding to this tight relationship that can exist between the organization of a system and its environment: you say that the organism continuously mirrors the environment, or that there is a representation of the environment in—or implicit in—the organization of the system.
这并不是说我们只将信念和欲望归因于(或应该归因于)那些我们发现了内部表征的事物,而是当我们发现某个对象适合采用意向策略时,我们会努力将其某些内部状态或过程解释为内部表征。事物的某些内部特征之所以成为表征,只能是因为它调节意向系统的行为所起的作用。
It is not that we attribute (or should attribute) beliefs and desires only to things in which we find internal representations, but rather that, when we discover some object for which the intentional strategy works, we endeavor to interpret some of its internal states or processes as internal representations. What makes some internal feature of a thing a representation could only be its role in regulating the behavior of an intentional system.
现在,强调我们与恒温器的亲缘关系的理由应该很清楚了。从一个简单的恒温器过渡到一个真正具有周围世界内部表征的系统,并没有神奇的时刻。恒温器对世界的表征要求最低,更高级的恒温器对世界的表征要求更高,更高级的居家机器人对世界的表征要求更高。最后你来到了我们。我们与世界的联系是如此的多样和错综复杂,几乎不可能有任何替代——尽管在思想实验中可以清楚地想象到。希拉里·普特南想象了孪生地球,它和地球一模一样,甚至和你邻居的孪生地球复制品鞋子上的磨损痕迹一样,但它与地球在某些属性上有所不同,而这些属性完全超出了你的辨别能力的阈值。 (孪生地球上的水的化学分析结果不同。)如果你被瞬间带到孪生地球,换成孪生地球的复制品,你永远不会更聪明——就像简单的控制系统无法分辨它是在调节温度、速度还是水箱中的水量。对于像恒温器这样简单且缺乏感官的东西,设计出完全不同的孪生地球很容易,但你的内部组织对替代品的要求要严格得多。你的孪生地球和地球必须是虚拟复制品,否则你到达时的状态将发生巨大变化。
Now the reason for stressing our kinship with the thermostat should be clear. There is no magic moment in the transition from a simple thermostat to a system that really has an internal representation of the world around it. The thermostat has a minimally demanding representation of the world, fancier thermostats have more demanding representations of the world, fancier robots for helping around the house would have still more demanding representations of the world. Finally you reach us. We are so multifariously and intricately connected to the world that almost no substitution is possible—though it is clearly imaginable in a thought experiment. Hilary Putnam imagines the planet Twin Earth, which is just like Earth right down to the scuff marks on the shoes of the Twin Earth replica of your neighbor, but which differs from Earth in some property that is entirely beneath the thresholds of your capacities to discriminate. (What they call water on Twin Earth has a different chemical analysis.) Were you to be whisked instantaneously to Twin Earth and exchanged for your Twin Earth replica, you would never be the wiser—just like the simple control system that cannot tell whether it is regulating temperature, speed, or volume of water in a tank. It is easy to devise radically different Twin Earths for something as simple and sensorily deprived as a thermostat, but your internal organization puts a much more stringent demand on substitution. Your Twin Earth and Earth must be virtual replicas or you will change state dramatically on arrival.
那么,当您认为锅炉开着时,您的信念是关于哪个锅炉的呢?为什么是您地下室里的锅炉(而不是孪生地球上的孪生锅炉)。您的信念是关于其他哪个锅炉的呢?完成对您的信念的语义解释,确定您的信念的指称,就像恒温器的情况一样,需要有关您在世界上的实际嵌入的事实。当我们将信念归因于人时发现的解释原则和问题,与我们在研究将信念归因于恒温器这一荒谬但又非常简单的问题时发现的原则和问题相同。差异程度不同,但程度如此之大,以至于理解简单意向系统的内部组织,对于理解复杂意向系统(如人类)的内部组织来说,几乎没有什么基础。
So which boiler are your beliefs about when you believe the boiler is on? Why, the boiler in your cellar (rather than its twin on Twin Earth, for instance). What other boiler would your beliefs be about? The completion of the semantic interpretation of your beliefs, fixing the referents of your beliefs, requires, as in the case of the thermostat, facts about your actual embedding in the world. The principles, and problems, of interpretation that we discover when we attribute beliefs to people are the same principles and problems we discover when we look at the ludicrous, but blessedly simple, problem of attributing beliefs to a thermostat. The differences are of degree, but nevertheless of such great degree that understanding the internal organization of a simple intentional system gives one very little basis for understanding the internal organization of a complex intentional system, such as a human being.
当我们转向这个问题:为什么意向策略如此有效时,我们发现这个问题很模糊,可以有两种截然不同的答案。如果意向系统是一个简单的恒温器,那么答案很简单:意向策略之所以有效,是因为恒温器设计精良;它被设计成一个可以轻松可靠地理解和操纵的系统。这是真的,但如果我们想要的是解释其性能的实际设计特征,那么答案就不是那么有用了。然而幸运的是,对于一个简单的恒温器来说,这些特征很容易发现和理解,因此我们的“为什么”问题的另一个答案很容易得到,而这实际上是关于机器如何工作的答案。
When we turn to the question of why the intentional strategy works as well as it does, we find that the question is ambiguous, admitting of two very different sorts of answer. If the intentional system is a simple thermostat, one answer is simply this: the intentional strategy works because the thermostat is well designed; it was designed to be a system that could be easily and reliably comprehended and manipulated from this stance. That is true, but not very informative, if what we are after are the actual features of its design that explain its performance. Fortunately, however, in the case of a simple thermostat those features are easily discovered and understood, so the other answer to our why question, which is really an answer about how the machinery works, is readily available.
如果所讨论的意向系统是人,那么我们的问题也存在歧义。对于意向策略为何有效这个问题,第一个答案是,进化将人类设计成理性的,相信他们应该相信的,想要他们应该想要的。我们是漫长而艰难的进化过程的产物,这一事实保证了在我们身上使用意向策略是万无一失的。这个答案具有真实和简洁的优点,但也令人惊讶地缺乏信息量。这个问题更难的版本实际上是在问大自然为我们提供的机器是如何运作的。而我们目前还不能对这个问题给出一个很好的答案。我们就是不知道。我们确实知道这个策略是如何运作的,我们知道它为什么有效的问题的简单答案,但知道这些对我们对困难答案没有多大帮助。
If the intentional system in question is a person, there is also an ambiguity in our question. The first answer to the question of why the intentional strategy works is that evolution has designed human beings to be rational, to believe what they ought to believe and want what they ought to want. The fact that we are products of a long and demanding evolutionary process guarantees that using the intentional strategy on us is a safe bet. This answer has the virtues of truth and brevity, but it is also strikingly uninformative. The more difficult version of the question asks, in effect, how the machinery which Nature has provided us works. And we cannot yet give a good answer to that question. We just do not know. We do know how the strategy works, and we know the easy answer to the question of why it works, but knowing these does not help us much with the hard answer.
然而,这并不是说缺乏理论。例如,斯金纳行为主义者会说,该策略之所以有效,是因为它对信念和欲望的归因实际上是对先前反应和强化历史的影响的难以想象的复杂描述的简写。说某人想要一些冰淇淋,就是说过去冰淇淋的摄入因结果而得到强化,在某些背景条件下(也太复杂而无法描述)产生了一种参与冰淇淋获取行为的倾向。在没有详细了解这些历史事实的情况下,我们仍然可以基于归纳理由做出精明的猜测;这些猜测体现在我们的意向立场主张中。即使所有这些都是真的,它也几乎不能告诉我们这些倾向是如何受到内部机制调节的。
It is not that there is any dearth of doctrine, however. A Skinnerian behaviorist, for instance, would say that the strategy works because its imputations of beliefs and desires are shorthand, in effect, for as yet unimaginably complex descriptions of the effects of prior histories of response and reinforcement. To say that someone wants some ice cream is to say that in the past the ingestion of ice cream has been reinforced in him by the results, creating a propensity under certain background conditions (also too complex to describe) to engage in icecream-acquiring behavior. In the absence of detailed knowledge of those historical facts we can nevertheless make shrewd guesses on inductive grounds; these guesses are embodied in our intentional stance claims. Even if all this were true, it would tell us very little about the way such propensities were regulated by the internal machinery.
目前更流行的解释是,策略如何运作和机制如何运作的解释将(大致)一致:对于每个可预测归因的信念,都会有一个功能上显著的机器内部状态,可以分解成功能部分,就像表达信念的句子可以分解成部分(即单词或术语)一样。我们归因于理性生物的推论将通过硬件中的物理因果过程反映出来;所相信命题的逻辑形式将以状态的结构形式复制与他们相对应。这是假设我们的大脑中存在一种思维语言,我们的大脑最终将被理解为符号操作系统,至少与计算机大致类似。在名为认知科学的新研究计划中,目前正在探索这种观点的许多不同版本,只要人们允许对基本大胆的主张进行很大的削弱,我认为它的某个版本将被证明是正确的。
A currently more popular explanation is that the account of how the strategy works and the account of how the mechanism works will (roughly) coincide: for each predictively attributable belief, there will be a functionally salient internal state of the machinery, decomposable into functional parts in just about the same way the sentence expressing the belief is decomposable into parts—that is, words or terms. The inferences we attribute to rational creatures will be mirrored by physical, causal processes in the hardware; the logical form of the propositions believed will be copied in the structural form of the states in correspondence with them. This is the hypothesis that there is a language of thought coded in our brains, and our brains will eventually be understood as symbol manipulating systems in at least rough analogy with computers. Many different versions of this view are currently being explored, in the new research program called cognitive science, and provided one allows great latitude for attenuation of the basic, bold claim, I think some version of it will prove correct.
但我不认为这是显而易见的。那些认为这种理论显然或不可避免地会被证明是正确的人(很多人都这么认为),混淆了两个经验主张。第一个是,意向立场描述产生了一个客观的、真实的世界模式——我们想象中的火星人所错过的模式。这是一个经验主张,但却是毫无疑问的。第二个是,这种真实模式是由智慧生物大脑中另一种与它大致同构的真实模式产生的。怀疑第二种真实模式的存在并不是怀疑第一种模式的存在。有理由相信第二种模式,但这些理由并不是压倒性的。我能给出的关于这些理由的最好的简单说明如下。
But I do not believe that this is obvious. Those who think that it is obvious, or inevitable, that such a theory will prove true (and there are many who do), are confusing two empirical claims. The first is that intentional stance description yields an objective, real pattern in the world—the pattern our imaginary Martians missed. This is an empirical claim, but one that is confirmed beyond skepticism. The second is that this real pattern is produced by another real pattern roughly isomorphic to it within the brains of intelligent creatures. Doubting the existence of the second real pattern is not doubting the existence of the first. There are reasons for believing in the second pattern, but they are not overwhelming. The best simple account I can give of the reasons is as follows.
随着我们从简单的恒温器、复杂的机器人到人类的复杂程度不断上升,我们发现,我们为设计具有必要行为的系统所做的努力越来越与组合爆炸问题相冲突。将某个参数增加 10%(增加 10% 的输入、增加要控制的行为的自由度、增加要识别的单词等)往往会使所设计系统的内部复杂性增加几个数量级。事情很快就会失控,例如,可能导致计算机程序淹没最大、最快的机器。现在,大脑不知何故解决了组合爆炸问题。它是一个由数十亿个细胞组成的庞大网络,但仍然是有限的、紧凑的、可靠的和快速的,并且能够几乎无限制地学习新的行为、词汇和理论。一些优雅的、生成的、可无限扩展的表示原理一定是造成这种情况的原因。我们只有一个这样的表示系统模型:人类语言。因此,关于思维语言的争论可以归结为:它还能是什么?迄今为止,我们无法详细想象任何合理的替代方案。我认为,这是一个很好的理由,可以作为科学策略建议我们尽可能地以各种形式追求假设。6但是,如果我们牢记其必然正确性远未得到保证,我们将更加谨慎、富有成效地进行这种探索。只要人们误以为经验假设必然正确,就无法很好地理解它。
As we ascend the scale of complexity form simple thermostat, through sophisticated robot, to human being, we discover that our efforts to design systems with the requisite behavior increasingly run foul of the problem of combinatorial explosion. Increasing some parameter by, say, ten percent—ten percent more inputs or more degrees of freedom in the behavior to be controlled or more words to be recognized or whatever—tends to increase the internal complexity of the system being designed by orders of magnitude. Things get out of hand very fast and, for instance, can lead to computer programs that will swamp the largest, fastest machines. Now somehow the brain has solved the problem of combinatorial explosion. It is a gigantic network of billions of cells, but still finite, compact, reliable, and swift, and capable of learning new behaviors, vocabularies, theories, almost without limit. Some elegant, generative, indefinitely extensible principles of representation must be responsible. We have only one model of such a representation system: a human language. So the argument for a language of thought comes down to this: what else could it be? We have so far been unable to imagine any plausible alternative in any detail. That is a good reason, I think, for recommending as a matter of scientific tactics that we pursue the hypothesis in its various forms as far as we can.6 But we will engage in that exploration more circumspectly, and fruitfully, if we bear in mind that its inevitable rightness is far from assured. One does not well understand even a true empirical hypothesis so long as one is under the misapprehension that it is necessarily true.
1.任何人的大多数信念都必定是真实的这一观点对某些人来说似乎是显而易见的。在奎因、普特南、舒梅克、戴维森和我本人的著作中都可以找到对这一观点的支持。其他人也同样觉得这个想法难以置信——所以很可能双方都把不同的现象称为信念。一旦人们区分了信念和观点(我的专业意义上的——丹尼特,1978b),根据这一区别,观点是受语言影响的、相对复杂的认知状态——大致是对某个特定的、公式化的句子的真实性下注的状态——人们就会发现大多数信念是真实的这一说法几乎是微不足道的。对一些外围问题的思考应该可以揭示这一点。以德谟克利特为例,他有一个系统的、包罗万象的,但(为了论证起见,我们可以说)完全是错误的物理学。他把所有的事情都搞错了,尽管他的观点成立并且具有某种系统的效用。但即使学术界允许我们归因于德谟克利特的每一项主张(无论是其著作中明确的还是隐含的)都是错误的,这些主张也只代表了他信念中微不足道的一部分,其中既包括他必然拥有的大量单调乏味的既定信念(关于他住在哪所房子里,要寻找什么样的好凉鞋,等等),也包括那些随着他的感知经验的变化而出现和消失的数百万偶然信念。
1. The idea that most of anyone’s beliefs must be true seems obvious to some people. Support for the idea can be found in works by Quine, Putnam, Shoemaker, Davidson, and myself. Other people find the idea equally incredible—so probably each side is calling a different phenomenon belief. Once one makes the distinction between belief and opinion (in my technical sense—Dennett, 1978b), according to which opinions are linguistically infected, relatively sophisticated cognitive states—roughly states of betting on the truth of a particular, formulated sentence—one can see the near triviality of the claim that most beliefs are true. A few reflections on peripheral matters should bring it out. Consider Democritus, who had a systematic, all-embracing, but (let us say, for the sake of argument) entirely false physics. He had things all wrong, though his views held together and had a sort of systematic utility. But even if every claim that scholarship permits us to attribute to Democritus (either explicit or implicit in his writings) is false, these represent a vanishingly small fraction of his beliefs, which include both the vast numbers of humdrum standing beliefs he must have had (about which house he lived in, what to look for in a good pair of sandals, and so forth) and also those occasional beliefs that came and went by the millions as his perceptual experience changed.
但是,有人可能会说,将德谟克利特的平淡无奇的信念与他的科学相隔离,依赖于观察真理和理论真理之间无法辩驳的区别;德谟克利特的所有信念都承载着理论,而由于他的理论是错误的,所以这些信念就是错误的。答复如下:既然所有观察信念都承载着理论,为什么我们要选择德谟克利特明确、复杂的理论(体现在他的《观点》中)作为他日常观察的理论呢?请注意,德谟克利特最不讲理论的同胞也拥有无数承载着理论的观察信念——而且在某种意义上,他并没有因此而变得更明智。为什么我们不认为德谟克利特的观察也承载着同样的(大概是无害的)理论呢?如果德谟克利特忘记了他的理论,或者改变了主意,他的观察信念基本不会受到影响。就他的复杂理论在他的日常行为和期望等方面发挥了明显作用而言,用复杂理论来表达他平淡无奇的信念是相当合适的,但这不会产生一个主要错误的信念目录,因为他的信念很少会受到影响。(然而,理论对观察的影响往往被低估。请参阅 Churchland (1979) 中关于理论和经验之间有时可能存在的紧密关系的戏剧性和令人信服的例子。)(本说明中的讨论是从与 Paul 和 Patricia Churchland 以及 Michael Stack 的一次有益对话中提炼出来的。)
But, it may be urged, this isolation of his humdrum beliefs from his science relies on an insupportable distinction between truths of observation and truths of theory; all Democritus’s beliefs are theory-laden, and since his theory is false, they are false. The reply is as follows: Granted that all observation beliefs are theory-laden, why should we choose Democritus’s explicit, sophisticated theory (couched in his opinions) as the theory with which to burden his quotidian observations? Note that the least theoretical compatriot of Democritus also had myriads of theory-laden observation beliefs—and was, in one sense, none the wiser for it. Why should we not suppose Democritus’s observations are laden with the same (presumably innocuous) theory? If Democritus forgot his theory, or changed his mind, his observational beliefs would be largely untouched. To the extent that his sophisticated theory played a discernible role in his routine behavior and expectations and so forth, it would be quite appropriate to couch his humdrum beliefs in terms of the sophisticated theory, but this will not yield a mainly false catalogue of beliefs, since so few of his beliefs will be affected. (The effect of theory on observation is nevertheless often underrated. See Churchland (1979) for dramatic and convincing examples of the tight relationship that can sometimes exist between theory and experience.) (The discussion in this note was distilled from a useful conversation with Paul and Patricia Churchland and Michael Stack.)
2.牛津的一位听众指出,如果火星人将地球人纳入其物理立场范围(我并没有明确排除这种可能性),他就不会对地球人的预测感到惊讶。他确实会准确预测出地球人说火星语时产生的 X 射线调制模式。没错,但当火星人写下他的计算结果时,他对地球人预测的预测就会一字不差地出现,就像在通灵板上一样,而让火星人感到吃惊的是,这个机制,这个装扮成火星人的地球人预测器,是如何在与火星人需要了解的事件信息如此隔绝的情况下,得出火星人这句真正的句子的,以便对汽车的到来做出自己的预测。
2. A member of the audience in Oxford pointed out that if the Martian included the Earthling in his physical stance purview (a possibility I had not explicitly excluded), he would not be surprised by the Earthling’s prediction. He would indeed have predicted exactly the pattern of X-ray modulations produced by the Earthling speaking Martian. True, but as the Martian wrote down the results of his calculations, his prediction of the Earthling’s prediction would appear, word by Martian word, as on a Ouija board, and what would be baffling to the Martian was how this chunk of mechanism, the Earthling predictor dressed up like a Martian, was able to yield this true sentence of Martian when it was so informationally isolated from the events the Martian needed to know of in order to make his own prediction about the arriving automobile.
3.可能存在不需要交流、预测、观察等技能的智慧生物吗?可能存在奇妙、灵巧、无懈可击的生物,它们不具备这些行为方式,但我看不出我们为什么称它们为智慧生物。
3. Might there not be intelligent beings who had no use for communicating, predicting, observing,…? There might be marvelous, nifty, invulnerable entities lacking these modes of action, but I cannot see what would lead us to call them intelligent.
4.约翰·麦卡锡 (John McCarthy) 的密码学类比很好地说明了这一点。密文语料库越大,出现双重、系统上不相关的解密的可能性就越小。有关应用于机器(明确包括恒温器)的意向立场的原则和前提的非常有用的讨论,请参阅 McCarthy,1979 年。
4. John McCarthy’s analogy to cryptography nicely makes this point. The larger the corpus of cipher text, the less chance there is of dual, systematically unrelated decipherings. For a very useful discussion of the principles and presuppositions of the intentional stance applied to machines—explicitly including thermostats—see McCarthy, 1979.
5。 这一思想实际上是归入“论物信仰”范畴的各种不同思想的祖先。如果人们从这一思想出发,去探究它的后代,就能更好地看到它们的困难,以及如何解决它们。(有关这一主题的更多信息,请参阅 Dennett,1982 年。)
5. This idea is the ancestor in effect of the species of different ideas lumped together under the rubric of de re belief. If one builds from this idea toward its scions, one can see better the difficulties with them, and how to repair them. (For more on this topic, see Dennett, 1982.)
6.迄今为止提出的所有心理表征的思维语言模型都以这样或那样的方式成为组合爆炸的牺牲品,这一事实应该会抑制人们参与福多尔恰当地称之为“城里唯一的游戏”的热情。
6. The fact that all language-of-thought models of mental representation so far proposed fall victim to combinatorial explosion in one way or another should temper one’s enthusiasm for engaging in what Fodor aptly calls “the only game in town”.
约翰·R·塞尔
John R. Searle
1980
1980
我们应该赋予最近计算机模拟人类认知能力的努力什么样的心理学和哲学意义?在回答这个问题时,我发现区分“强”人工智能和“弱”或“谨慎”人工智能很有用。根据弱人工智能,计算机在研究心智方面的主要价值在于它为我们提供了一个非常强大的工具。例如,它使我们能够以比以前更严格和更精确的方式制定和测试假设。但根据强人工智能,计算机不仅仅是研究心智的工具;相反,适当编程的计算机实际上是一种心智,因为被赋予正确程序的计算机可以说理解并具有其他认知状态。而且,根据强人工智能,由于编程的计算机具有认知状态,程序不仅仅是使我们能够测试心理解释的工具;相反,程序本身就是解释。至少就本文而言,我对弱人工智能的说法没有异议。我在这里的讨论将针对我定义为强人工智能的主张,特别是适当编程的计算机确实具有认知状态,并且程序因此可以解释人类认知的主张。当我提到人工智能时,我指的是这两个主张所表达的强人工智能版本。
What psychological and philosophical significance should we attach to recent efforts at computer simulations of human cognitive capacities? In answering this question, I find it useful to distinguish what I will call “strong” AI from “weak” or “cautious” AI. According to weak AI, the principal value of the computer in the study of the mind is that it gives us a very powerful tool. For example, it enables us to formulate and test hypotheses in a more rigorous and precise fashion than before. But according to strong AI, the computer is not merely a tool in the study of the mind; rather, the appropriately programmed computer really is a mind in the sense that computers given the right programs can be literally said to understand and have other cognitive states. And, according to strong AI, because the programmed computer has cognitive states, the programs are not mere tools that enable us to test psychological explanations; rather, the programs are themselves the explanations. I have no objection to the claims of weak AI, at least as far as this article is concerned. My discussion here will be directed to the claims I have defined as strong AI, specifically the claim that the appropriately programmed computer literally has cognitive states and that the programs thereby explain human cognition. When I refer to AI, it is the strong version as expressed by these two claims which I have in mind.
我将考虑耶鲁大学的 Roger Schank 及其同事的工作(例如,参见 Schank and Abelson,1977a),因为我对它比对任何类似的说法都更熟悉,并且它为我想要研究的工作类型提供了一个清晰的例子。但接下来的内容并不依赖于 Schank 程序的细节。同样的论点也适用于 Winograd (1973) 的 SHRDLU、Weizenbaum (1966) 的 ELIZA,以及任何对人类心理现象的图灵机模拟。
I will consider the work of Roger Schank and his colleagues at Yale (see, for instance, Schank and Abelson, 1977a), because I am more familiar with it than I am with any similar claims, and because it provides a clear example of the sort of work I wish to examine. But nothing that follows depends upon the details of Schank’s programs. The same arguments would apply to Winograd’s (1973) SHRDLU, Weizenbaum’s (1966) ELIZA, and indeed, any Turing-machine simulation of human mental phenomena.
简而言之,抛开各种细节,我们可以这样描述 Schank 的程序:该程序的目的是模拟人类理解故事的能力。人类理解故事的能力的一个特点是,他们可以回答有关故事的问题,即使他们提供的信息并未在故事中明确说明。因此,例如,假设给你以下故事:“一个男人走进一家餐馆,点了一个汉堡包。汉堡送来时,被烧焦了,这个男人愤怒地冲出了餐馆,既没有付汉堡钱,也没有留下小费。”现在,如果你被问到“这个男人吃了汉堡包吗?”,你大概会回答“没有,他没吃”。同样,如果你听到以下故事:“一个男人走进一家餐馆,点了一个汉堡包;汉堡送来时,他非常满意;当他离开餐厅时,他在付账之前给了女服务员一大笔小费。”,并且你被问到“这个男人吃了汉堡包吗?”,你大概会回答“是的,他吃了汉堡包。”
Briefly, and leaving out the various details, one can describe Schank’s program as follows: the aim of the program is to simulate the human ability to understand stories. It is characteristic of the abilities of human beings to understand stories that they can answer questions about the story, even though the information they give was not explicitly stated in the story. Thus, for example, suppose you are given the following story: “A man went into a restaurant and ordered a hamburger. When the hamburger arrived, it was burned to a crisp, and the man stormed out of the restaurant angrily without paying for the hamburger or leaving a tip.” Now, if you are given the question “Did the man eat the hamburger?”, you will presumably answer, “No, he did not.” Similarly if you are given the following story: “A man went into a restaurant and ordered a hamburger; when the hamburger came, he was very pleased with it; and as he left the restaurant he gave the waitress a large tip before paying his bill.”, and you are asked the question “Did the man eat the hamburger?”, you will presumably answer, “Yes, he ate the hamburger.”
现在,Schank 的机器可以以这种方式回答有关餐馆的问题。为了做到这一点,它们需要“表示”人类对餐馆的了解,这样它们才能根据这些故事回答上述问题。当机器听到故事,然后被问到问题时,机器会打印出我们期望人类在听到类似故事时给出的答案。强人工智能的支持者声称,在这个问答序列中,机器不仅模拟了人类的能力,而且还:
Now Schank’s machines can similarly answer questions about restaurants in this fashion. In order to do so, they have a “representation” of the sort of information that human beings have about restaurants which enables them to answer such questions as those above, given these sorts of stories. When the machine is given the story and then asked the question, the machine will print out answers of the sort that we would expect human beings to give if told similar stories. Partisans of strong AI claim that in this question-and-answer sequence, not only is the machine simulating a human ability but also:
(a)可以说机器真正理解了故事,并给出了问题的答案;
(a) The machine can literally be said to understand the story and provide answers to questions; and
(b) 机器及其程序的作用解释了人类理解故事和回答有关问题的能力。
(b) What the machine and its program do explains the human ability to understand the story and answer questions about it.
在我看来,Schank 的著作完全不支持 (a) 和 (b) 这两个主张,我将在下文中尝试证明这一点。1
Claims (a) and (b) seem to me totally unsupported by Schank’s work, as I will attempt to show in what follows.1
检验任何心理理论的一种方法是问自己,如果一个人的心理真的按照该理论所说的所有心理都遵循的原则运作,会是什么样子。让我们用下面的思想实验将这个测试应用到 Schank 程序中。假设我被锁在一个房间里,假设我得到了一大堆中文。进一步假设,事实确实如此,我既不懂书面中文,也不懂口语中文,我甚至不确定我能不能把中文识别为不同于日文或无意义的曲线的中文。现在进一步假设,在第一批中文之后,我得到了第二批中文,以及一套将第二批与第一批关联起来的规则。这些规则是英文的,我和任何其他以英语为母语的人一样理解这些规则。它们使我能够将一组形式符号与另一组形式符号关联起来,而这里“形式”的全部含义是,我可以完全通过符号的形状来识别它们。现在假设我得到了第三批中文符号以及一些指令(也是英文的),这些指令使我能够将第三批符号中的元素与前两批符号关联起来,这些规则指导我如何根据第三批符号中给出的某些形状,返回具有某些形状的某些中文符号。
A way to test any theory of mind is to ask oneself what it would be like if one’s own mind actually worked on the principles that the theory says all minds work on. Let us apply this test to the Schank program with the following Gedankenexperiment. Suppose that I am locked in a room and suppose that I’m given a large batch of Chinese writing. Suppose furthermore, as is indeed the case, that I know no Chinese either written or spoken, and that I’m not even confident that I could recognize Chinese writing as Chinese writing distinct from, say, Japanese writing or meaningless squiggles. Now suppose further that, after this first batch of Chinese writing, I am given a second batch of Chinese script together with a set of rules for correlating the second batch with the first batch. The rules are in English and I understand these rules as well as any other native speaker of English. They enable me to correlate one set of formal symbols with another set of formal symbols, and all that “formal” means here is that I can identify the symbols entirely by their shapes. Now suppose also that I am given a third batch of Chinese symbols together with some instructions, again in English, that enable me to correlate elements of this third batch with the first two batches, and these rules instruct me how I am to give back certain Chinese symbols with certain sorts of shapes in response to certain sorts of shapes given me in the third batch.
我不知道,给我所有这些符号的人把第一批称为“脚本”,把第二批称为“故事”,把第三批称为“问题”。此外,他们把我在回答第三批问题时给出的符号称为“问题答案”,而他们给我的一套英文规则则称为“程序”。为了让故事稍微复杂一点,想象一下这些人也用我能理解的英文讲故事,然后他们用英文问我关于这些故事的问题,我用英文回答他们。再假设一段时间后,我变得非常擅长按照说明操作中文符号,程序员也变得非常擅长编写程序,以至于从外部的角度来看——也就是说,从我被锁在的房间外的人的角度来看——我对问题的回答与以中文为母语的人的回答没有区别。没有人看得出我不会说一句中文。让我们再假设我对英文问题的回答与其他以英语为母语的人的回答没有区别,原因很简单,因为我是以英语为母语的。从外部角度看,从阅读我的“答案”的人的角度来看,中文问题和英文问题的答案同样好。但与英文不同,中文答案是通过操纵未经解释的形式符号来得出的。对于中文,我只是像计算机一样行事;我对形式指定的元素执行计算操作。对于中文,我只是计算机程序的一个实例。
Unknown to me, the people who are giving me all of these symbols call the first batch a “script”, they call the second batch a “story”, and they call the third batch “questions”. Furthermore, they call the symbols I give them back in response to the third batch “answers to the questions”, and the set of rules in English that they gave me they call “the program”. To complicate the story a little bit, imagine that these people also give me stories in English which I understand, and they then ask me questions in English about these stories, and I give them back answers in English. Suppose also that after a while I get so good at following the instructions for manipulating the Chinese symbols and the programmers get so good at writing the programs that from the external point of view—that is, from the point of view of somebody outside the room in which I am locked—my answers to the questions are indistinguishable from those of native Chinese speakers. Nobody looking at my answers can tell that I don’t speak a word of Chinese. Let us also suppose that my answers to the English questions are, as they no doubt would be, indistinguishable from those of other native English speakers, for the simple reason that I am a native speaker of English. From the external point of view, from the point of view of someone reading my “answers”, the answers to the Chinese questions and the English questions are equally good. But in the Chinese case, unlike the English case, I produce the answers by manipulating uninterpreted formal symbols. As far as the Chinese is concerned, I simply behave like a computer; I perform computational operations on formally specified elements. For the purposes of the Chinese, I am simply an instantiation of the computer program.
现在强人工智能声称,经过编程的计算机能够理解故事,并且程序在某种意义上能够解释人类的理解。但现在我们可以根据我们的思想实验来检验这些说法。
Now the claims made by strong AI are that the programmed computer understands the stories and that the program in some sense explains human understanding. But we are now in a position to examine these claims in light of our thought experiment.
(a) 关于第一个主张,从这个例子中,我似乎明显地看出我一点都不懂中文故事。我的输入和输出与以中文为母语的人的输入和输出没有区别,我可以编写任何你喜欢的正式程序,但我仍然什么都不懂。出于同样的原因,Schank 的计算机对任何故事都一无所知,无论是中文、英文还是其他语言,因为在中文故事中,计算机就是我;而在计算机不是我的情况下,计算机所拥有的也和在我什么都不懂的情况下我所拥有的一样多。
(a) As regards the first claim, it seems to me obvious in the example that I do not understand a word of the Chinese stories. I have inputs and outputs that are indistinguishable from those of the native Chinese speaker, and I can have any formal program you like, but I still understand nothing. Schank’s computer, for the same reasons, understands nothing of any stories, whether in Chinese, English, or whatever, since in the Chinese case the computer is me; and in cases where the computer is not me, the computer has nothing more than I have in the case where I understand nothing.
(b) 关于第二个主张——程序解释了人类的理解——我们可以看到,计算机及其程序并没有提供理解的充分条件,因为计算机和程序都在运行,但并没有理解。但它是否提供了理解的必要条件或重大贡献?强人工智能支持者提出的一个主张是:当我理解一个英文故事时,我所做的与我在操作中文符号时所做的完全相同——或者可能大同小异。这只是更正式的符号操作,将我能理解的英文情况与我不理解的中文情况区分开来。我没有证明这个说法是错误的,但在这个例子中,它肯定是一个令人难以置信的说法。
(b) As regards the second claim—that the program explains human understanding—we can see that the computer and its program do not provide sufficient conditions of understanding, since the computer and the program are functioning and there is no understanding. But does it even provide a necessary condition or a significant contribution to understanding? One of the claims made by the supporters of strong AI is this: when I understand a story in English, what I am doing is exactly the same—or perhaps more of the same—as what I was doing in the case of manipulating the Chinese symbols. It is simply more formal symbol manipulation which distinguishes the case in English, where I do understand, from the case in Chinese, where I don’t. I have not demonstrated that this claim is false, but it would certainly appear an incredible claim in the example.
这种说法的合理性源于这样的假设:我们可以构建一个程序,该程序具有与母语人士相同的输入和输出,此外,我们假设说话者具有某种程度的描述,他们也是程序的实例。基于这两个假设,我们假设即使 Schank 的程序不是理解的全部,也可能是理解的一部分。我想,这是一种经验上的可能性,但到目前为止,还没有丝毫理由认为这是真的,因为这个例子暗示——虽然肯定没有证明——计算机程序与我对故事的理解无关。在中文的情况下,我拥有人工智能可以通过程序输入给我的所有内容,但我什么都不懂;在英文的情况下,我理解一切,到目前为止,没有任何理由认为我的理解与计算机程序有任何关系——也就是说,与纯粹形式指定元素的计算操作有关。
Such plausibility as the claim has derives from the supposition that we can construct a program that will have the same inputs and outputs as native speakers, and in addition we assume that speakers have some level of description where they are also instantiations of a program. On the basis of these two assumptions, we assume that even if Schank’s program isn’t the whole story about understanding, maybe it is part of the story. That is, I suppose, an empirical possibility, but not the slightest reason has so far been given to suppose it is true, since what is suggested—though certainly not demonstrated—by the example is that the computer program is irrelevant to my understanding of the story. In the Chinese case I have everything that artificial intelligence can put into me by way of a program, and I understand nothing; in the English case I understand everything, and there is so far no reason at all to suppose that my understanding has anything to do with computer programs—that is, with computational operations on purely formally specified elements.
只要程序是根据纯粹形式定义元素上的计算操作来定义的,这个例子就表明,这些元素本身与理解没有任何有趣的联系。它们当然不是充分条件,也没有丝毫理由认为它们是必要条件,甚至没有理由认为它们对理解有重大贡献。请注意,这个论点的重点不在于不同的机器可以在不同的形式原则下运行,但输入和输出却相同——这根本不是重点——而是无论你把什么纯粹的形式原则输入计算机,都不足以理解,因为人类可以在不理解任何东西的情况下遵循形式原则,也没有理由认为它们是必要的,甚至是有帮助的,因为没有理由认为当我理解英语时,我正在使用任何形式程序。
As long as the program is defined in terms of computational operations on purely formally-defined elements, what the example suggests is that these by themselves have no interesting connection with understanding. They are certainly not sufficient conditions, and not the slightest reason has been given to suppose that they are necessary conditions or even that they make a significant contribution to understanding. Notice that the force of the argument is not simply that different machines can have the same input and output while operating on different formal principles—that is not the point at all—but rather that whatever purely formal principles you put into the computer will not be sufficient for understanding, since a human will be able to follow the formal principles without understanding anything, and no reason has been offered to suppose they are necessary or even contributory, since no reason has been given to suppose that when I understand English, I am operating with any formal program at all.
那么,在英语句子中,我拥有什么,而在中文句子中却没有呢?答案显然是,我知道前者的意思,但对后者的意思却一无所知。这包括什么?为什么我们不能把它交给机器,不管它是什么?为什么不能把关于我的一切告诉我,使我知道英语句子的意思?在进一步阐述我的例子之后,我将回到这些问题上。
What is it, then, that I have in the case of the English sentences which I do not have in the case of the Chinese sentences? The obvious answer is that I know what the former mean but haven’t the faintest idea what the latter mean. In what does this consist, and why couldn’t we give it to a machine, whatever it is? Why couldn’t the machine be given whatever it is about me that makes it the case that I know what English sentences mean? I will return to these questions after developing my example a little more.
我曾有机会向人工智能领域的几位工作者介绍这个例子,有趣的是,他们似乎对如何正确回答这个问题意见不一。我得到的答复多种多样,下面我将考虑其中最常见的答复(并说明其地理来源)。首先,我想消除一些关于“理解”的常见误解。在许多这样的讨论中,人们发现对“理解”一词的花言巧语。批评我的人指出,理解有不同的程度,“理解”不是一个简单的二元谓词,甚至有不同种类和层次的理解,排中律通常甚至不能直接适用于“ x理解y ”形式的陈述,在许多情况下, x是否理解y是一个需要决策的问题,而不是一个简单的事实问题。等等。
I have had occasions to present this example to several workers in artificial intelligence and, interestingly, they do not seem to agree on what the proper reply to it is. I get a surprising variety of replies, and in what follows I will consider the most common of these (specified along with their geographical origins). First I want to block out some common misunderstandings about “understanding”. In many of these discussions one finds fancy footwork about the word ‘understanding’. My critics point out that there are different degrees of understanding, that ‘understands’ is not a simple two-place predicate, that there are even different kinds and levels of understanding, and often the law of the excluded middle doesn’t even apply in a straightforward way to statements of the form ‘x understands y’, that in many cases it is a matter for decision and not a simple matter of fact whether x understands y. And so on.
对于所有这些观点,我想说:“当然,当然。”但它们与争论点无关。有明显的例子适用于“理解”,也有明显的例子不适用;而这些例子就是我进行这一论证所需要的全部。2我理解英文故事;在较小程度上我能理解法文故事;在较小程度上我能理解德文故事;而中文故事,则完全不能理解。另一方面,我的汽车和加法机什么都不懂;它们不属于这一行。
To all these points I want to say: “Of course, of course.” But they have nothing to do with the points at issue. There are clear cases where ‘understands’ applies and clear cases where it does not apply; and such cases are all I need for this argument.2 I understand stories in English; to a lesser degree I can understand stories in French; to a still lesser degree, stories in German; and in Chinese, not at all. My car and my adding machine, on the other hand, understand nothing; they are not in that line of business.
我们常常用隐喻和类比的方式将“理解”和其他认知谓词归因于汽车、加法机和其他人工制品;但这种归因并不能证明什么。我们说“门因为有光电管所以知道什么时候打开”,“加法机知道(理解如何,能够)做加法和减法但不知道除法”,以及“恒温器能感知温度的变化”。我们做出这些归因的原因很有趣,与我们在人工制品中延伸我们自己的意向性有关;3我们的工具是我们目的的延伸,因此我们很自然地将意向性隐喻地归因于它们。但我认为这样的例子并不能解决哲学问题。自动门从光电管“理解指令”的含义与我理解的英语的含义完全不同。
We often attribute “understanding” and other cognitive predicates by metaphor and analogy to cars, adding machines, and other artifacts; but nothing is proved by such attributions. We say, “The door knows when to open because of its photoelectric cell”, “The adding machine knows how (understands how, is able) to do addition and subtraction but not division”, and “The thermostat perceives changes in the temperature”. The reason we make these attributions is interesting and has to do with the fact that in artifacts we extend our own intentionality;3 our tools are extensions of our purposes, and so we find it natural to make metaphorical attributions of intentionality to them. But I take it no philosophical ice is cut by such examples. The sense in which an automatic door “understands instructions” from its photoelectric cell is not at all the sense in which I understand English.
如果 Schank 的编程计算机理解故事的意义是门理解的隐喻意义,而不是我理解英语的意义,那么这个问题就不值得讨论。Newell 和 Simon 写道,他们声称计算机的“理解”意义与人类完全相同。我喜欢这种说法的直截了当,这也是我将要考虑的那种说法。我将论证,从字面意义上讲,编程计算机理解汽车和加法机理解的东西:完全什么都不懂。计算机的理解不仅仅是(就像我对德语的理解一样)部分或不完整;它是零。
If the sense in which Schank’s programmed computers understand stories were supposed to be the metaphorical sense in which the door understands, and not the sense in which I understand English, the issue would not be worth discussing. Newell and Simon write that the sense of “understanding” they claim for computers is exactly the same as for human beings. I like the straightforwardness of this claim, and it is the sort of claim I will be considering. I will argue that, in that literal sense, the programmed computer understands what the car and the adding machine understand: exactly nothing. The computer’s understanding is not just (as in the case of my understanding of German) partial or incomplete; it is zero.
现在来回复一下。
Now to the replies.
系统回答(伯克利):虽然被锁在房间里的那个人确实不理解这个故事,但事实上他只是整个系统的一部分,而系统确实理解这个故事。这个人面前有一本大账本,上面写着规则,他有很多草稿纸和铅笔可以做计算,他有一套中文符号的“数据库”。现在,理解并不归因于个人;而是归因于他所属的整个系统。
I THE SYSTEMS REPLY (Berkeley): While it is true that the individual person who is locked in the room does not understand the story, the fact is that he is merely part of a whole system and the system does understand the story. The person has large ledger in front of him in which are written the rules, he has a lot of scratch paper and pencils for doing calculations, he has “data banks” of sets of Chinese symbols. Now, understanding is not being ascribed to the mere individual; rather it is being ascribed to this whole system of which he is a part.
我对系统理论的回应很简单。让个人内化系统的所有这些元素。他记住账本中的规则和中文符号的数据库,并在头脑中进行所有计算。然后个人将整个系统纳入其中。系统中没有任何东西是他不包含的。我们甚至可以摆脱房间,假设他在户外工作。尽管如此,他一点也不懂中文,更不用说系统了,因为系统中没有任何东西不在他身上。如果他不理解,那么系统就不可能理解,因为系统只是他的一部分。
My response to the systems theory is simple. Let the individual internalize all of these elements of the system. He memorizes the rules in the ledger and the data banks of Chinese symbols, and he does all the calculations in his head. The individual then incorporates the entire system. There isn’t anything at all to the system which he does not encompass. We can even get rid of the room and suppose he works outdoors. All the same, he understands nothing of the Chinese, and a fortiori neither does the system, because there isn’t anything in the system which isn’t in him. If he doesn’t understand, then there is no way the system could understand because the system is just a part of him.
实际上,我甚至对系统理论给出这样的答案都感到有些尴尬,因为在我看来,这个理论从一开始就不可信。这个想法是,虽然一个人不懂中文,但这个人和一些纸片的结合可能会以某种方式理解中文。我很难想象一个不受意识形态控制的人怎么会觉得这个想法完全可信。不过,我认为许多信奉强人工智能意识形态的人最终会倾向于说出类似这样的话;所以让我们进一步探讨一下。根据这种观点的一个版本,虽然内化系统例子中的人不像以中文为母语的人那样理解中文(因为,例如,他不知道故事中指的是餐馆和汉堡包,等等),但“作为形式符号操作系统的人”确实懂中文。人的子系统,即中文的形式符号操作系统,不应与英语的子系统混淆。
Actually I feel somewhat embarrassed even to give this answer to the systems theory because the theory seems to me so implausible to start with. The idea is that while a person doesn’t understand Chinese, somehow the conjunction of that person and some bits of paper might understand Chinese. It is not easy for me to imagine how someone who was not in the grip of an ideology would find the idea at all plausible. Still, I think many people who are committed to the ideology of strong AI will in the end be inclined to say something very much like this; so let us pursue it a bit further. According to one version of this view, while the man in the internalized systems example doesn’t understand Chinese in the sense that a native Chinese speaker does (because, for example, he doesn’t know that the story refers to restaurants and hamburgers, and so on), still “the man as formal symbol manipulation system” really does understand Chinese. The subsystem of the man which is the formal symbol manipulation system for Chinese should not be confused with the subsystem for English.
因此,这个人身上实际上有两个子系统;一个懂英文,另一个懂中文,而且“只是这两个系统彼此之间关系不大”。但是,我想回答的是,它们不仅彼此关系不大,甚至一点也不相似。懂英文的子系统(假设我们允许自己暂时使用“子系统”这个术语)知道故事是关于餐馆和吃汉堡包之类的;他知道有人问他关于餐馆的问题,并且他正在通过从故事内容中做出各种推断来尽力回答问题,等等。但中文系统对此一无所知;而英文子系统知道“hamburgers”指的是汉堡包,中文子系统只知道“squiggle-squiggle”后面跟着“squoggle-squoggle”。他所知道的只是,各种形式符号从一端被引入,并按照英语写成的规则进行操作,而其他符号则从另一端传出。
So there are really two subsystems in the man; one understands English, the other Chinese, and “it’s just that the two systems have little to do with each other”. But, I want to reply, not only do they have little to do with each other, they are not even remotely alike. The subsystem that understands English (assuming we allow ourselves to talk in this jargon of “subsystems” for a moment) knows that the stories are about restaurants and eating hamburgers, and the like; he knows that he is being asked questions about restaurants and that he is answering questions as best he can by making various inferences from the content of the story, and so on. But the Chinese system knows none of this; whereas the English subsystem knows that ‘hamburgers’ refers to hamburgers, the Chinese subsystem knows only that ‘squiggle-squiggle’ is followed by ‘squoggle-squoggle’. All he knows is that various formal symbols are being introduced at one end and are manipulated according to rules written in English, and that other symbols are going out at the other end.
原始示例的全部要点是论证这种符号操作本身不足以理解任何字面意义上的中文,因为这个人可以在不理解任何中文的情况下写出“squoggle-squoggle”和“squiggle-squiggle”。而假设这个人体内有子系统并不符合这一论点,因为这些子系统并不比这个人好;它们仍然没有任何类似于说英语的人(或子系统)的东西。事实上,在所描述的情况下,中文子系统只是英语子系统的一部分,是按照英语规则进行无意义符号操作的一部分。
The whole point of the original example was to argue that such symbol manipulation by itself couldn’t be sufficient for understanding Chinese in any literal sense because the man could write ‘squoggle-squoggle’ after ‘squiggle-squiggle’ without understanding anything in Chinese. And it doesn’t meet that argument to postulate subsystems within the man, because the subsystems are no better off than the man was in the first place; they still don’t have anything even remotely like what the English-speaking man (or subsystem) has. Indeed, in the case as described, the Chinese subsystem is simply a part of the English subsystem, a part that engages in meaningless symbol manipulation according to the rules of English.
首先,让我们问问自己,系统回复的动机是什么——也就是说,有什么独立的理由可以说明代理必须有一个能够真正理解中文故事的子系统?据我所知,唯一的理由是,在这个例子中,我的输入和输出与以中文为母语的人相同,并且有一个从一端到另一端的程序。但这个例子的重点是表明,这不足以理解我理解英语故事的意义,因为一个人,因此构成一个人的一套系统,可以有正确的输入、输出和程序组合,但仍然不能理解我理解英语的相关字面意义上的任何内容。
Let us ask ourselves what is supposed to motivate the systems reply in the first place—that is, what independent grounds are there supposed to be for saying that the agent must have a subsystem within him that literally understands stories in Chinese? As far as I can tell, the only grounds are that in the example I have the same input and output as native Chinese speakers, and a program that goes from one to the other. But the point of the example has been to show that that couldn’t be sufficient for understanding, in the sense in which I understand stories in English, because a person, hence the set of systems that go to make up a person, could have the right combination of input, output, and program and still not understand anything in the relevant literal sense in which I understand English.
说我体内一定有一个子系统能理解中文的唯一动机是,我有一个程序,我可以通过图灵测试:我能愚弄以中文为母语的人(见图灵,1950;本卷第 6 章)。但争论的焦点之一恰恰是图灵测试的充分性。这个例子表明,可能有两个“系统”,它们都通过了图灵测试,但只有一个能理解;并且说,既然它们都通过了图灵测试,它们就一定都能理解,这并不能反驳这一点,因为这种说法无法满足我体内理解英语的系统比仅仅处理中文的系统拥有更多东西的论点。简而言之,系统的回答只是在回避问题,因为它坚持认为系统一定能理解中文,没有任何论据。
The only motivation for saying there must be a subsystem in me that understands Chinese is that I have a program and I can pass the Turing test: I can fool native Chinese speakers (see Turing, 1950; Chapter 6 of this volume). But precisely one of the points at issue is the adequacy of the Turing test. The example shows that there could be two “systems”, both of which pass the Turing test, but only one of which understands; and it is no argument against this point to say that, since they both pass the Turing test, they must both understand, since this claim fails to meet the argument that the system in me which understands English has a great deal more than the system which merely processes Chinese. In short, the systems reply simply begs the question by insisting without argument that the system must understand Chinese.
此外,系统的回答似乎会导致本身就荒谬的后果。如果我们要得出结论,认为我身上一定有认知能力,因为我有某种输入和输出,中间有一个程序,那么看起来好像各种非认知子系统都会变成认知子系统。例如,我的胃有一个描述层次,它可以进行信息处理,并实例化任意数量的计算机程序,但我认为我们不想说它有任何理解能力。然而,如果我们接受系统的回答,很难看出我们如何避免说胃、心脏、肝脏等都是理解子系统,因为没有原则性的方法来区分说中国子系统理解和说胃理解的动机。 (顺便说一句,说中文系统有信息作为输入和输出,胃有食物和食品作为输入和输出,这并不是对这一点的回答,因为从代理的角度来看,从我的角度来看,食物和中文中都没有信息;中文只是许多毫无意义的曲线。中文案例中的信息完全存在于程序员和解释者的眼中,如果他们愿意的话,没有什么可以阻止他们将我的消化器官的输入和输出视为信息。)
Furthermore, the systems reply would appear to lead to consequences that are independently absurd. If we are to conclude that there must be cognition in me on the grounds that I have a certain sort of input and output and a program in between, then it looks as though all sorts of noncognitive subsystems are going to turn out to be cognitive. For example, my stomach has a level of description where it does information processing, and it instantiates any number of computer programs, but I take it we do not want to say that it has any understanding. Yet if we accept the systems reply, it is hard to see how we can avoid saying that stomach, heart, liver, and so on, are all understanding subsystems, since there is no principled way to distinguish the motivation for saying the Chinese subsystem understands from saying that the stomach understands. (It is, by the way, not an answer to this point to say that the Chinese system has information as input and output and the stomach has food and food products as input and output, since from the point of view of the agent, from my point of view, there is no information in either the food or the Chinese; the Chinese is just so many meaningless squiggles. The information in the Chinese case is solely in the eyes of the programmers and the interpreters, and there is nothing to prevent them from treating the input and output of my digestive organs as information if they so desire.)
最后一点与强人工智能中的一些独立问题有关,值得暂时离题来解释一下。如果强人工智能是心理学的一个分支,它必须能够区分真正属于精神的系统和非精神系统。它必须能够区分心智运作的原理和非精神系统运作的原理;否则,它就无法为我们提供关于精神的具体精神方面的解释。精神/非精神的区别不能只存在于旁观者的眼中——它必须是系统所固有的。否则,任何旁观者都可以随意将人视为非精神的,例如将飓风视为精神的。
This last point bears on some independent problems in strong AI, and it is worth digressing for a moment to explain it. If strong AI is to be a branch of psychology, it must be able to distinguish systems which are genuinely mental from those which are not. It must be able to distinguish the principles on which the mind works from those on which nonmental systems work; otherwise it will offer us no explanations of what is specifically mental about the mental. And the mental/nonmental distinction cannot be just in the eye of the beholder—it must be intrinsic to the systems. For otherwise it would be up to any beholder to treat people as nonmental and, for instance, hurricanes as mental, if he likes.
但在人工智能文献中,这种区别常常被模糊化,从长远来看,这将对人工智能是一种认知研究的说法产生灾难性的影响。例如,麦卡锡写道:“像恒温器这样简单的机器可以说有信念,而有信念似乎是大多数具有解决问题能力的机器的特征”(1979 年)。任何认为强人工智能有可能成为一种心理理论的人都应该思考一下这句话的含义。我们被要求接受强人工智能的发现,即我们用来调节温度的墙上的一块金属具有与我们、我们的配偶和我们的孩子完全相同的信念,而且房间里的“大多数”其他机器——电话、录音机、加法机、电灯开关等等——也具有这种字面意义上的信念。本文的目的不是反驳麦卡锡的观点,所以我将简单地断言以下内容,不作任何论证。心智研究始于这样的事实:人类有信念,而恒温器、电话和加法机没有。如果你得出一个否定这一点的理论,你就为该理论提出了一个反例,而该理论是错误的。
But quite often in the AI literature the distinction is blurred in ways which would in the long run prove disastrous to the claim that AI is a cognitive inquiry. McCarthy, for example, writes: “Machines as simple as thermostats can be said to have beliefs, and having beliefs seems to be a characteristic of most machines capable of problem solving performance” (1979). Anyone who thinks strong AI has a chance as a theory of the mind ought to ponder the implications of that remark. We are asked to accept it as a discovery of strong AI that the hunk of metal on the wall which we use to regulate the temperature has beliefs in exactly the same sense that we, our spouses, and our children have beliefs, and furthermore that “most” of the other machines in the room—telephone, tape recorder, adding machine, electric light switch, and so on—also have beliefs in this literal sense. It is not the aim of this article to argue against McCarthy’s point, so I will simply assert the following without argument. The study of the mind starts with such facts as that humans have beliefs and thermostats, telephones, and adding machines don’t. If you get a theory that denies this point, you have produced a counter-example to the theory, and the theory is false.
人们会觉得,写这类东西的人工智能专家认为他们可以侥幸逃脱惩罚,因为他们并不真正认真对待它,而且他们认为其他人也不会认真对待它。我建议,至少暂时认真对待它。花一分钟时间认真思考一下,需要什么才能确定那边墙上的那块金属有真实的信念,有适合方向、命题内容和满足条件的信念;有可能是强信念或弱信念的信念;紧张、焦虑或安全的信念;教条、理性或迷信的信念;盲目的信仰或犹豫的思考;任何类型的信念。恒温器不是候选对象。胃、肝脏、加法机或电话也不是。但是,既然我们认真对待这个想法,请注意,它的真实性对于强人工智能是一门思维科学的说法将是致命的,因为现在思维无处不在。我们想知道的是思维与恒温器、肝脏和其他东西的区别。如果麦卡锡是对的,强人工智能就不可能告诉我们这一点。
One gets the impression that people in AI who write this sort of thing think they can get away with it because they don’t really take it seriously and they don’t think anyone else will either. I propose, for a moment at least, to take it seriously. Think hard for one minute about what would be necessary to establish that that hunk of metal on the wall over there has real beliefs, beliefs with direction of fit, propositional content, and conditions of satisfaction; beliefs that have the possibility of being strong beliefs or weak beliefs; nervous, anxious or secure beliefs; dogmatic, rational, or superstitious beliefs; blind faiths or hesitant cogitations; any kind of beliefs. The thermostat is not a candidate. Neither are stomach, liver, adding machine, or telephone. However, since we are taking the idea seriously, notice that its truth would be fatal to the claim of strong AI to be a science of the mind, for now the mind is everywhere. What we wanted to know is what distinguishes the mind from thermostats, livers, and the rest. And if McCarthy were right, strong AI wouldn’t have a hope of telling us that.
II机器人的回答(耶鲁):假设我们编写了一种与 Schank 的程序不同的程序。假设我们将一台计算机放入机器人中,这台计算机不仅会将形式符号作为输入,将形式符号作为输出,而且会以某种方式实际操作机器人,使机器人能够做一些非常类似于感知、行走、移动、敲钉子、吃饭、喝水的事情——任何你喜欢的事情。例如,机器人会安装一台电视摄像机,使它能够看东西,它会有胳膊和腿,使它能够行动,而这一切都由它的计算机大脑控制。与 Schank 的计算机不同,这样的机器人将具有真正的理解力和其他心理状态。
II THE ROBOT REPLY (Yale): Suppose we wrote a different kind of program from Schank’s program. Suppose we put a computer inside a robot, and this computer would not just take in formal symbols as input and give out formal symbols as output, but rather it would actually operate the robot in such a way that the robot does something very much like perceiving, walking, moving about, hammering nails, eating, drinking—anything you like. The robot would, for example, have a television camera attached to it that enabled it to see, it would have arms and legs that enabled it to act, and all of this would be controlled by its computer brain. Such a robot would, unlike Schank’s computer, have genuine understanding and other mental states.
关于机器人的回答,首先要注意的是,它默认认知不仅仅是形式符号操作的问题,因为这种回答增加了一组与外部世界的因果关系。但对机器人回答的回答是,增加这种“感知”和“运动”能力并没有给 Schank 的原始程序带来任何理解上的提升,也没有带来任何意向性上的提升。要了解这一点,请注意同样的思想实验也适用于机器人的情况。假设你不是把我放在机器人里面,而是把我放在房间里,然后像原来的中文案例一样,给我更多的中文符号和更多的英文指令,让我将中文符号与中文符号匹配,并将中文符号反馈给外部。
The first thing to notice about the robot reply is that it tacitly concedes that cognition is not solely a matter of formal symbol manipulation, since this reply adds a set of causal relations with the outside world. But the answer to the robot reply is that the addition of such “perceptual” and “motor” capacities adds nothing by way of understanding, in particular, or intentionality, in general, to Schank’s original program. To see this, notice that the same thought experiment applies to the robot case. Suppose that, instead of the computer inside the robot, you put me inside the room and you give me again, as in the original Chinese case, more Chinese symbols with more instructions in English for matching Chinese symbols to Chinese symbols and feeding back Chinese symbols to the outside.
现在假设,我不知道,我接收到的一些中文符号来自机器人上的电视摄像机,而我发出的其他中文符号用于使机器人内部的马达移动机器人的腿或手臂。必须强调的是,我所做的只是操纵形式符号;我对其他事实一无所知。我从机器人的“感知”装置接收“信息”,并在不知道这些事实的情况下向其马达装置发出“指令”。我是机器人的侏儒,但与传统的侏儒不同,我不知道发生了什么。除了符号操作规则外,我什么都不懂。现在在这种情况下,我想说机器人根本没有任何意向状态;它只是由于其电线和程序而移动。此外,通过实例化程序,我没有相关类型的意向状态。我所做的只是遵循有关操纵形式符号的正式指令。
Now suppose also that, unknown to me, some of the Chinese symbols that come to me come from a television camera attached to the robot, and other Chinese symbols that I am giving out serve to make the motors inside the robot move the robot’s legs or arms. It is important to emphasize that all I am doing is manipulating formal symbols; I know none of these other facts. I am receiving “information” from the robot’s “perceptual” apparatus, and I am giving out “instructions” to its motor apparatus without knowing either of these facts. I am the robot’s homunculus, but unlike the traditional homunculus, I don’t know what’s going on. I don’t understand anything except the rules for symbol manipulation. Now in this case I want to say that the robot has no intentional states at all; it is simply moving about as a result of its electrical wiring and its program. And furthermore, by instantiating the program, I have no intentional states of the relevant type. All I do is follow formal instructions about manipulating formal symbols.
III大脑模拟器的回答(伯克利和麻省理工学院):假设我们设计一个程序,它不代表我们所掌握的有关世界的信息,例如 Schank 脚本中的信息,而是模拟以中文为母语的人在理解中文故事并给出答案时大脑突触处神经元放电的实际序列。机器将中文故事和有关它们的问题作为输入,模拟实际中国人大脑处理这些故事的形式结构,并给出中文答案作为输出。我们甚至可以想象,机器不是使用单个串行程序运行,而是使用一整套并行运行的程序运行,其方式可能与实际人类大脑处理自然语言时的方式相同。现在,在这种情况下,我们肯定会说机器理解了这些故事;如果我们拒绝这么说,我们是否也必须否认以中文为母语的人理解了这些故事?在突触层面上,计算机程序和中国人大脑程序会有什么不同或可能有什么不同?
III THE BRAIN-SIMULATOR REPLY (Berkeley and MIT): Suppose we design a program that doesn’t represent information that we have about the world, such as the information in Schank’s scripts, but simulates the actual sequence of neuron firings at the synapses of the brain of a native Chinese speaker when he understands stories in Chinese and gives answers to them. The machine takes in Chinese stories and questions about them as input, it simulates the formal structure of actual Chinese brains in processing these stories, and it gives out Chinese answers as outputs. We can even imagine that the machine operates not with a single serial program but with a whole set of programs operating in parallel, in the manner that actual human brains presumably operate when they process natural language. Now surely in such a case we would have to say that the machine understood the stories; and if we refuse to say that, wouldn’t we also have to deny that native Chinese speakers understood the stories? At the level of the synapses what would or could be different about the program of the computer and the program of the Chinese brain?
在回答这个问题之前,我想先离题一下,指出对于任何人工智能(功能主义等)的拥护者来说,这样的回答都是很奇怪的。我认为强人工智能的整个理念就是我们不需要知道大脑如何工作就能知道思维如何工作。我曾经假设的基本假设是,存在一种心理操作水平,它由构成心理本质的形式元素的计算过程组成,并且可以在各种不同的大脑过程中实现,就像任何计算机程序都可以在不同的计算机硬件中实现一样。根据强人工智能的假设,思维之于大脑就像程序之于硬件,因此我们无需进行神经生理学研究就可以理解思维。如果我们必须知道大脑如何工作才能进行人工智能,我们就不会费心研究人工智能了。
Before addressing this reply, I want to digress to note that it is an odd reply for any partisan of artificial intelligence (functionalism, and so on) to make. I thought the whole idea of strong artificial intelligence is that we don’t need to know how the brain works to know how the mind works. The basic hypothesis, or so I had supposed, was that there is a level of mental operations that consists in computational processes over formal elements which constitute the essence of the mental, and can be realized in all sorts of different brain processes in the same way that any computer program can be realized in different computer hardware. On the assumptions of strong AI, the mind is to the brain as the program is to the hardware, and thus we can understand the mind without doing neurophysiology. If we had to know how the brain worked in order to do AI, we wouldn’t bother with AI.
然而,即使如此接近大脑的运作方式,也不足以产生理解。为了说明这一点,想象一下,不是让一个只会说一种语言的人在房间里敲打符号,而是让这个人操作一组复杂的水管,水管上连接着阀门。当这个人收到中文符号时,他会在用英文编写的程序中查找需要打开和关闭哪些阀门。每个水管连接都对应着中国人大脑中的一个突触,整个系统被装配好,这样在完成所有正确的操作后——也就是说,在打开所有正确的水龙头后——中文答案就会从一系列管道的输出端弹出。
However, even getting this close to the operation of the brain is still not sufficient to produce understanding. To see that this is so, imagine that instead of a monolingual man in a room shuffling symbols we have the man operate an elaborate set of water pipes with valves connecting them. When the man receives the Chinese symbols he looks up in the program, written in English, which valves he has to turn on and off. Each water connection corresponds to a synapse in the Chinese brain, and the whole system is rigged up so that after doing all the right firings—that is, after turning on all the right faucets—the Chinese answers pop out at the output end of the series of pipes.
那么,这个系统的理解力在哪里呢?它以中文为输入,模拟中国大脑突触的形式结构,并输出中文。但这个人肯定不懂中文,水管也不懂。如果我们倾向于接受我认为荒谬的观点,即人与水管的结合在某种程度上是可以理解的,请记住,原则上,人可以内化水管的形式结构,并在想象中进行所有的“神经元激发”。大脑模拟器的问题在于它模拟了大脑的错误之处。只要它只模拟突触处神经元激发序列的形式结构,它就不会模拟大脑的重要之处:产生意向状态的能力。水管的例子表明,形式属性不足以描述因果属性。我们可以将所有形式属性从相关的神经生物学因果属性中分离出来。
Now where is the understanding in this system? It takes Chinese as input, it simulates the formal structure of the synapses of the Chinese brain, and it gives Chinese as output. But the man certainly doesn’t understand Chinese, and neither do the water pipes. And if we are tempted to adopt what I think is the absurd view that somehow the conjunction of man and water pipes understands, remember that in principle the man can internalize the formal structure of the water pipes and do all the “neuron firings” in his imagination. The problem with the brain simulator is that it is simulating the wrong things about the brain. As long as it simulates only the formal structure of the sequence of neuron firings at the synapses, it won’t have simulated what matters about the brain: its ability to produce intentional states. And that the formal properties are not sufficient for the causal properties is shown by the water pipe example. We can have all the formal properties carved off from the relevant neurobiological causal properties.
IV组合回答(伯克利和斯坦福):虽然前三个回答本身可能不足以完全驳斥中文房间反例,但如果将这三个回答放在一起,它们将更具说服力,甚至更具决定性。想象一下,一个机器人的颅腔内装有一台大脑形状的计算机;想象一下,这台计算机被人类大脑的所有突触编程;想象一下,机器人的整个行为与人类行为没有区别;现在将整个机器人视为一个统一的系统,而不仅仅是一台具有输入和输出的计算机。在这种情况下,我们当然必须将意向性归因于系统。
IV THE COMBINATION REPLY (Berkeley and Stanford): While each of the previous three replies might not be completely convincing by itself as a refutation of the Chinese room counter-example, if you take all three together they are collectively much more convincing and even decisive. Imagine a robot with a brain-shaped computer lodged in its cranial cavity; imagine the computer programmed with all the synapses of a human brain; imagine that the whole behavior of the robot is indistinguishable from human behavior; and now think of the whole thing as a unified system and not just as a computer with inputs and outputs. Surely in such a case we would have to ascribe intentionality to the system.
我完全同意,在这种情况下,只要我们对机器人意向性一无所知,我们就会发现接受机器人具有意向性的假设是合理的,而且确实无法抗拒。事实上,除了外观和行为之外,组合的其他元素实际上并不重要。如果我们能制造出一种行为在很大范围内与人类行为难以区分的机器人,我们会将意向性归因于它,除非有某种理由不这样做。我们不需要事先知道它的计算机大脑是人类大脑的形式类似物。
I entirely agree that in such a case we would find it rational and indeed irresistible to accept the hypothesis that the robot had intentionality, as long as we knew nothing more about it. Indeed, besides appearance and behavior, the other elements of the combination are really irrelevant. If we could build a robot whose behavior was indistinguishable over a large range from human behavior, we would attribute intentionality to it, pending some reason not to. We wouldn’t need to know in advance that its computer brain was a formal analogue of the human brain.
但我真的不认为这对强人工智能的主张有任何帮助,原因如下。根据强人工智能,用正确的输入和输出实例化一个形式程序是意向性的充分条件,实际上是意向性的构成要素。正如 Newell (1980) 所说,精神的本质是物理符号系统的运作(另见本卷第 3 章)。但在这个例子中,我们给机器人赋予意向性与形式程序无关。它们只是基于这样的假设:如果机器人的外观和行为与我们足够相似,我们会假设,除非证明不是这样,它一定具有像我们一样的精神状态,这些精神状态导致并由其行为表达,并且它一定具有能够产生这种精神状态的内在机制。如果我们独立地知道如何在没有这些假设的情况下解释它的行为,我们就不会将意向性归因于它,特别是如果我们知道它有一个形式程序。这就是我之前对机器人回复的回应的要点。
But I really don’t see that this is any help to the claims of strong AI, and here is why. According to strong AI, instantiating a formal program with the right input and output is a sufficient condition of, indeed is constitutive of, intentionality. As Newell (1980) puts it, the essence of the mental is the operation of a physical symbol system (see also Chapter 3 of this volume). But the attributions of intentionality that we make to the robot in this example have nothing to do with formal programs. They are simply based on the assumption that if the robot looks and behaves sufficiently like us, we would suppose, until proven otherwise, that it must have mental states like ours, which cause and are expressed by its behavior, and it must have an inner mechanism capable of producing such mental states. If we knew independently how to account for its behavior without such assumptions, we would not attribute intentionality to it, especially if we knew it had a formal program. And this is the point of my earlier response to the robot reply.
假设我们知道机器人的行为完全是由这样一个事实引起的:机器人内部的人从机器人的感觉受体接收未经解释的形式符号,并向机器人的运动机制发送未经解释的形式符号,而人根据一系列规则进行这种符号操作。此外,假设人对机器人的这些事实一无所知;他只知道对哪些无意义的符号执行哪些操作。在这种情况下,我们会将机器人视为一个巧妙的机械假人。假人有思想的假设现在将是没有根据和不必要的,因为现在不再有任何理由将意向性归因于机器人或它所属的系统(当然除了人操纵符号的意向性)。形式符号操作继续进行,输入和输出正确匹配,但唯一真正的意向性中心是人,而他不知道任何相关的意向状态;例如,他看不到机器人的眼睛,他无意移动机器人的手臂,他不理解机器人对他说或机器人说的任何话。由于前面提到的原因,人和机器人所属的系统也不理解。
Suppose we knew that the robot’s behavior was entirely accounted for by the fact that a man inside it was receiving uninterpreted formal symbols from the robot’s sensory receptors and sending out uninterpreted formal symbols to its motor mechanisms, and the man was doing this symbol manipulation in accordance with a bunch of rules. Furthermore, suppose the man knows none of these facts about the robot; all he knows is which operations to perform on which meaningless symbols. In such a case we would regard the robot as an ingenious mechanical dummy. The hypothesis that the dummy has a mind would now be unwarranted and unnecessary, for there is now no longer any reason to ascribe intentionality to the robot or to the system of which it is a part (except of course for the man’s intentionality in manipulating the symbols). The formal symbol manipulations go on, the input and output are correctly matched, but the only real locus of intentionality is the man, and he doesn’t know any of the relevant intentional states; he doesn’t, for example, see what comes into the robot’s eyes, he doesn’t intend to move the robot’s arm, and he doesn’t understand any of the remarks made to or by the robot. Nor, for the reasons stated earlier, does the system of which man and robot are a part.
为了说明这一点,我们将这种情况与我们认为完全自然地将意向性归因于某些其他灵长类物种(如猿和猴子)以及家畜(如狗)的情况进行对比。我们认为这是自然的原因大致有两个:如果不归因于意向性,我们就无法理解动物的行为,并且我们可以看到这些动物是由与我们相似的东西构成的——眼睛、鼻子、皮肤和等等。鉴于动物行为的连贯性和假设其背后有相同的因果因素,我们既假设动物的行为背后一定有心理状态,又假设心理状态一定是由与我们类似的物质制成的机制产生的。除非有理由不这样做,否则我们肯定会对机器人做出类似的假设;但只要我们知道行为是正式程序的结果,而物理实体的实际因果属性无关紧要,我们就会放弃意向性假设。
To see the point, contrast this case with cases where we find it completely natural to ascribe intentionality to members of certain other primate species, such as apes and monkeys, and to domestic animals, such as dogs. The reasons we find it natural are, roughly, two: we can’t make sense of the animal’s behavior without the ascription of intentionality, and we can see that the beasts are made of stuff similar to our own—an eye, a nose, its skin, and so on. Given the coherence of the animal’s behavior and the assumption of the same causal stuff underlying it, we assume both that the animal must have mental states underlying its behavior, and that the mental states must be produced by mechanisms made out of the stuff that is like our stuff. We would certainly make similar assumptions about the robot unless we had some reason not to; but as soon as we knew that the behavior was the result of a formal program, and that the actual causal properties of the physical substance were irrelevant, we would abandon the assumption of intentionality.
对我的例子还有另外两种回应,它们经常出现(因此值得讨论),但实际上却没有抓住重点。
There are two other responses to my example which come up frequently (and so are worth discussing) but really miss the point.
VT HE OTHER-MINDS 回答(耶鲁):你怎么知道其他人是否理解中文或其他内容?只能通过他们的行为。现在计算机可以像他们一样通过行为测试(原则上),所以如果你要将认知归因于其他人,原则上你也必须将其归因于计算机。
V THE OTHER-MINDS REPLY (Yale): How do you know that other people understand Chinese or anything else? Only by their behavior. Now the computer can pass the behavior tests as well as they can (in principle), so if you are going to attribute cognition to other people, you must in principle also attribute it to computers.
反对意见只需简短回复即可。本讨论的问题不在于我如何知道其他人具有认知状态,而在于当我将认知状态归因于他们时,我将什么归因于他们。该论点的要点是,它不可能只是计算过程及其输出,因为即使没有认知状态,也可能存在计算过程及其输出。假装麻木并不能回答这个论点。在“认知科学”中,人们假设心理的现实性和可知性,就像在物理科学中,人们必须假设物理对象的现实性和可知性一样。
The objection is worth only a short reply. The problem in this discussion is not about how I know that other people have cognitive states, but rather what it is that I am attributing to them when I attribute cognitive states to them. The thrust of the argument is that it couldn’t be just computational processes and their output because there can be computational processes and their output without the cognitive state. It is no answer to this argument to feign anesthesia. In “cognitive sciences” one presupposes the reality and knowability of the mental in the same way that in physical sciences one has to presuppose the reality and knowability of physical objects.
VI众豪斯的回答(伯克利):你的整个论点都假设人工智能只与模拟和数字计算机有关。但这恰好是目前的技术状态。无论你所说的这些对意向性至关重要的因果过程是什么(假设你是对的),最终我们将能够制造出具有这些因果过程的设备,这就是人工智能。所以你的论点绝不是针对人工智能产生和解释认知的能力。
VI THE MANY-MANSIONS REPLY (Berkeley): Your whole argument presupposes that AI is only about analogue and digital computers. But that just happens to be the present state of technology. Whatever these causal processes are that you say are essential for intentionality (assuming you are right), eventually we will be able to build devices that have these causal processes, and that will be artificial intelligence. So your arguments are in no way directed at the ability of artificial intelligence to produce and explain cognition.
我对这个答复没有异议,但我要说的是,它实际上将强人工智能项目重新定义为人工产生和解释认知的任何东西,从而使强人工智能项目变得微不足道。人工智能最初提出的主张的意义在于它是一个精确、定义明确的论点:心理过程是形式定义元素上的计算过程。我一直想挑战这个论点。如果重新定义这个主张,使它不再是那个论点,我的反对意见就不再适用,因为不再有一个可测试的假设可供它们应用。
I have no objection to this reply except to say that it in effect trivializes the project of strong artificial intelligence by redefining it as whatever artificially produces and explains cognition. The interest of the original claim made on behalf of artificial intelligence is that it was a precise, well defined thesis: mental processes are computational processes over formally defined elements. I have been concerned to challenge that thesis. If the claim is redefined so that it is no longer that thesis, my objections no longer apply, because there is no longer a testable hypothesis for them to apply to.
现在让我们回到我承诺要回答的问题。假设在我最初的例子中,我理解英语,但不懂中文,因此假设机器既不懂英语也不懂中文,但我身上一定有某种东西使我理解英语,而我身上也缺少某种东西使我无法理解中文。那么,为什么我们不能将前者的东西(不管是什么)给予机器呢?
Let us now return to the questions I promised I would try to answer. Granted that in my original example I understand the English and I do not understand the Chinese, and granted therefore that the machine doesn’t understand either English or Chinese, still there must be something about me that makes it the case that I understand English, and a corresponding something lacking in me which makes it the case that I fail to understand Chinese. Now why couldn’t we give the former something, whatever it is, to a machine?
原则上,我认为没有理由不能赋予机器理解英语或中文的能力,因为从某种重要意义上讲,我们的身体和大脑正是这样的机器。但我确实看到非常有力的论据,说我们不能赋予机器这样的能力,因为机器的操作仅根据正式定义的元素的计算过程来定义——也就是说,机器的操作被定义为计算机程序的实例。我之所以能够理解英语并具有其他形式的意向性,并不是因为我是计算机程序的实例。(我想我是任何数量的计算机程序的实例。)相反,据我们所知,这是因为我是具有特定生物(即化学和物理)结构的某种有机体,并且这种结构在特定条件下能够因果地产生感知、行动、理解、学习和其他意向现象。本论点的部分观点是,只有具有这些因果能力的东西才能具有这种意向性。也许其他物理和化学过程可以产生这些效果;例如,也许火星人也具有意向性,但他们的大脑是由不同的物质构成的。这是一个经验问题,就像光合作用是否可以由具有不同于叶绿素的化学性质的物质进行的问题一样。
I see no reason in principle why we couldn’t give a machine the capacity to understand English or Chinese, since in an important sense our bodies with our brains are precisely such machines. But I do see very strong arguments for saying that we could not give such a thing to a machine where the operation of the machine is defined solely in terms of computational processes over formally defined elements—that is, where the operation of the machine is defined as an instantiation of a computer program. It is not because I am the instantiation of a computer program that I am able to understand English and have other forms of intentionality. (I am, I suppose, the instantiation of any number of computer programs.) Rather, as far as we know, it is because I am a certain sort of organism with a certain biological (that is, chemical and physical) structure, and this structure under certain conditions is causally capable of producing perception, action, understanding, learning, and other intentional phenomena. And part of the point of the present argument is that only something that had those causal powers could have that intentionality. Perhaps other physical and chemical processes could produce exactly these effects; perhaps, for example, Martians also have intentionality, but their brains are made of different stuff. That is an empirical question, rather like the question whether photosynthesis can be done by something with a chemistry different from that of chlorophyll.
但本论证的要点是,没有任何纯粹的形式模型本身足以实现意向性,因为形式属性本身并不构成意向性,它们本身没有因果能力,除了在实例化时在机器运行时产生形式主义的下一个状态的能力。形式模型的特定实现所具有的任何其他因果属性与形式模型无关,因为我们总是可以将相同的形式模型放在显然不存在这些因果属性的不同实现中。即使说中文的人奇迹般地准确地实现了 Schank 的程序,我们也可以将相同的程序放入说英语的人、水管或计算机中,尽管有程序,但它们都不懂中文。
But the main point of the present argument is that no purely formal model will ever be by itself sufficient for intentionality, because the formal properties are not by themselves constitutive of intentionality, and they have by themselves no causal powers except the power, when instantiated, to produce the next state of the formalism when the machine is running. And any other causal properties which particular realizations of the formal model have are irrelevant to the formal model, because we can always put the same formal model in a different realization where those causal properties are obviously absent. Even if by some miracle Chinese speakers exactly realize Schank’s program, we can put the same program in English speakers, water pipes, or computers, none of which understand Chinese, the program notwithstanding.
大脑运作中最重要的不是突触序列所投射的形式阴影,而是序列的实际属性。我所见过的所有支持强人工智能的论据都坚持在认知所投射的阴影周围画出轮廓,然后声称阴影才是真实的东西。
What matters about brain operation is not the formal shadow cast by the sequence of synapses but rather the actual properties of the sequences. All arguments for the strong version of artificial intelligence that I have seen insist on drawing an outline around the shadows cast by cognition and then claiming that the shadows are the real thing.
最后,我想陈述一下该论证中隐含的一些一般哲学观点。为了清晰起见,我将尝试以问答的方式进行,我从那句老话开始:
By way of concluding I want to state some of the general philosophical points implicit in the argument. For clarity I will try to do it in a question-and-answer fashion, and I begin with that old chestnut:
• 机器可以思考吗?
• Could a machine think?
答案显然是肯定的。我们就是这样的机器。
The answer is, obviously: Yes. We are precisely such machines.
• 是的,但是人工制品,人造机器,能够思考吗?
• Yes, but could an artifact, a man-made machine, think?
假设有可能人工制造出一台具有神经系统、具有轴突和树突的神经元以及其余所有东西的机器,这些机器与我们足够相似,那么这个问题的答案似乎再次显而易见:是的。如果你能精确复制原因,你就能复制结果。事实上,使用与人类不同的化学原理,有可能产生意识、意向性以及所有其他东西。正如我所说,这是一个经验问题。
Assuming it is possible to produce artificially a machine with a nervous system, neurons with axons and dendrites, and all the rest of it, sufficiently like ours, again the answer to the question seems to be obviously: Yes. If you can exactly duplicate the causes, you can duplicate the effects. And indeed it might be possible to produce consciousness, intentionality, and all the rest of it, using chemical principles different from those human beings use. It is, as I said, an empirical question.
• 好的,但是数字计算机可以思考吗?
• OK, but could a digital computer think?
如果“数字计算机”指的是任何具有一定描述层次的东西,它可以被正确地描述为计算机程序的实例,那么,由于我们是任意数量的计算机程序的实例,并且我们可以思考,所以答案当然是:是的。
If by “digital computer” we mean anything at all which has a level of description where it can correctly be described as the instantiation of a computer program, then, since we are the instantiations of any number of computer programs and we can think, again the answer is, of course: Yes.
• 但仅仅凭借一台装有正确程序的计算机,某物是否能够思考、理解等等?实例化一个程序(当然是正确的程序)本身是否是理解的充分条件?
• But could something think, understand, and so on, solely by virtue of being a computer with the right sort of program? Could instantiating a program, the right program of course, by itself be a sufficient condition for understanding?
我认为这是正确的问题,尽管它通常会与一个或多个之前的疑问相混淆,但答案是:不。
This I think is the right question to ask, though it is usually confused with one or more of the earlier questions, and the answer to it is: No.
• 为什么不呢?
• Why not?
因为形式符号操作本身不具有任何意向性。它们毫无意义——它们甚至不是符号操作,因为“符号”不象征任何东西。用语言术语来说,它们只有语法,没有语义。计算机似乎具有的这种意向性完全存在于编程者和使用计算机的人、发送输入和解释输出的人的头脑中。
Because the formal symbol manipulations by themselves don’t have any intentionality. They are meaningless—they aren’t even symbol manipulations, since the “symbols” don’t symbolize anything. In the linguistic jargon, they have only a syntax but no semantics. Such intentionality as computers appear to have is solely in the minds of those who program them and those who use them, those who send in the input and who interpret the output.
中文房间的例子就是为了说明这一点,我们只要把一个真正具有意向性的东西(一个人)放入系统,然后用正式程序对这个人进行编程,你就会发现正式程序并没有附加任何意向性。例如,它并没有增加一个人理解中文的能力。
The aim of the Chinese room example was to try to show this by showing that, as soon as we put something into the system which really does have intentionality, a man, and we program the man with the formal program, you can see that the formal program carries no additional intentionality. It adds nothing, for example, to a man’s ability to understand Chinese.
人工智能的这一特点似乎非常吸引人——程序与实现之间的区别——却对模拟可能是重复的说法造成了致命打击。程序与硬件实现之间的区别似乎与心理操作水平与大脑操作水平之间的区别相似。如果我们可以将心理操作水平描述为一个正式的程序,那么我们似乎可以描述心灵的本质,而无需进行内省心理学或大脑神经生理学研究。但是,“心灵之于大脑就像程序之于硬件”这一等式在几个方面不成立,其中包括以下三个方面。
Precisely that feature of AI which seemed so appealing—the distinction between the program and the realization—proves fatal to the claim that simulation could be duplication. The distinction between the program and its realization in the hardware seems to be parallel to the distinction between the level of mental operations and the level of brain operations. And if we could describe the level of mental operations as a formal program, it seems we could describe what was essential about the mind without doing either introspective psychology or neurophysiology of the brain. But the equation “Mind is to brain as program is to hardware” breaks down at several points, among them the following three.
首先,程序和实现之间的区别导致的结果是,同一个程序可能具有各种各样疯狂的实现,而这些实现没有任何形式的意向性。例如,魏泽鲍姆 (1976) 详细展示了如何使用一卷卫生纸和一堆小石头构建计算机。同样,中文故事理解程序可以编入一系列水管、一组风力发电机或只会说英语的人中——这些都不会因此而获得对中文的理解。石头、卫生纸、风和水管本来就不是具有意向性的东西(只有具有与大脑相同的因果能力的东西才能具有意向性),尽管英语使用者拥有适合意向性的东西,但您很容易就会发现,他不会通过记住程序获得任何额外的意向性,因为记住程序不会教他中文。
First, the distinction between program and realization has the consequence that the same program could have all sorts of crazy realizations which have no form of intentionality. Weizenbaum (1976), for example, shows in detail how to construct a computer using a roll of toilet paper and a pile of small stones. Similarly, the Chinese story-understanding program can be programmed into a sequence of water pipes, a set of wind machines, or a monolingual English speaker—none of which thereby acquires an understanding of Chinese. Stones, toilet paper, wind, and water pipes are the wrong kind of stuff to have intentionality in the first place (only something that has the same causal powers as brains can have intentionality), and, though the English speaker has the right kind of stuff for intentionality, you can easily see that he doesn’t get any extra intentionality by memorizing the program, since memorizing it won’t teach him Chinese.
其次,程序是纯粹形式化的,但意向状态却不是这样形式化的。它们是根据内容而不是形式来定义的。例如,下雨的信念不是被定义为某种形式形状,而是被定义为某种心理内容,具有满足条件、适合方向等(参见 Searle,1979)。事实上,信念本身甚至没有这种句法意义上的形式形状,因为同一个信念可以在不同的语言系统中被赋予无数不同的句法表达。
Second, the program is purely formal, but the intentional states are not in that way formal. They are defined in terms of their content, not their form. The belief that it is raining, for example, if defined not as a certain formal shape, but as a certain mental content, with conditions of satisfaction, a direction of fit, and so on (see Searle, 1979). Indeed, the belief as such hasn’t even got a formal shape in this syntactical sense, since one and the same belief can be given an indefinite number of different syntactical expressions in different linguistic systems.
第三,正如我之前提到的,心理状态和事件是大脑运作的产物,但程序并不是计算机的产物。
Third, as I mentioned before, mental states and events are a product of the operation of the brain, but the program is not in that way a product of the computer.
• 如果程序绝不是心理过程的组成部分,那么为什么有那么多人相信相反的观点呢?这至少需要一些解释。
• Well if programs are in no way constitutive of mental processes, then why have so many people believed the converse? That at least needs some explanation.
我不知道答案。计算机模拟可能是真实事物的想法一开始就应该显得可疑,因为计算机绝不仅限于模拟心理操作。没有人会认为计算机模拟的火灾会烧毁整个街区,或者计算机模拟的暴雨会让我们所有人都被淋湿。为什么有人会认为计算机模拟的理解力真的能理解任何事情?有时人们会说让计算机感受到痛苦或坠入爱河是极其困难的,但爱和痛苦并不比认知或其他任何东西更难或更容易。对于模拟,你所需要的只是正确的输入和输出,以及将前者转换为后者的中间程序。这就是计算机做任何事情所需要的全部。将模拟与复制混为一谈是同样的错误,无论是痛苦、爱、认知、火灾还是暴雨。
I don’t know the answer to that. The idea that computer simulations could be the real thing ought to have seemed suspicious in the first place, because the computer isn’t confined to simulating mental operations, by any means. No one supposes that computer simulations of a fire-alarm fire will burn the neighborhood down, or that a computer simulation of a rainstorm will leave us all drenched. Why on earth would anyone suppose that a computer simulation of understanding actually understood anything? It is sometimes said that it would be frightfully hard to get computers to feel pain or fall in love, but love and pain are neither harder nor easier than cognition or anything else. For simulation, all you need is the right input and output and a program in the middle that transforms the former into the latter. That is all the computer has for anything it does. To confuse simulation with duplication is the same mistake, whether it is pain, love, cognition, fires, or rainstorms.
尽管如此,仍有一些原因可以解释为什么人工智能似乎在某种程度上能够重现并解释心理现象,而且对许多人来说,也许现在仍然如此。我相信,除非我们完全揭示了产生这些幻觉的原因,否则我们将无法消除这些幻觉。
Still, there are several reasons why AI must have seemed, and to many people perhaps still does seem in some way to reproduce and thereby explain mental phenomena. And I believe we will not succeed in removing these illusions until we have fully exposed the reasons that give rise to them.
首先,也许是最重要的,是对“信息处理”概念的混淆。认知科学领域的许多人认为,人类大脑通过思维进行所谓的“信息处理”,类似地,计算机通过程序进行信息处理;但另一方面,火灾和暴雨根本不进行信息处理。因此,尽管计算机可以模拟任何过程的形式特征,但它与思维和大脑有着特殊的关系,因为当计算机被正确编程时,理想情况下使用与大脑相同的程序,这两种情况下的信息处理是相同的,而这种信息处理实际上是心理的本质。
First, and perhaps most important, is a confusion about the notion of “information processing”. Many people in cognitive science believe that the human brain with its mind does something called “information processing”, and, analogously, the computer with its program does information processing; but fires and rainstorms, on the other hand, don’t do information processing at all. Thus, though the computer can simulate the formal features of any process whatever, it stands in a special relation to the mind and brain because, when the computer is properly programmed, ideally with the same program as the brain, the information processing is identical in the two cases, and this information processing is really the essence of the mental.
但这一论点的问题在于,它建立在“信息”概念的模糊性之上。当人们思考算术问题或阅读和回答故事问题时,他们会“处理信息”,而编程计算机并不进行“信息处理”。相反,它所做的是操纵形式符号。程序员和计算机输出的解释者使用符号来代表世界上的物体,这一事实完全超出了计算机的范围。再说一遍,计算机有语法,但没有语义。因此,如果你在计算机中输入“2 加 2 等于?”,它会输入“4”。但它不知道“4”代表 4,也不知道它有什么意义。问题不在于它缺乏关于其一阶符号解释的一些二阶信息,而在于就计算机而言,它的一阶符号没有任何解释。计算机所拥有的只是更多的符号。
But the trouble with this argument is that it rests on an ambiguity in the notion of “information”. In the sense in which people “process information” when they reflect, say, on problems in arithmetic or when they read and answer questions about stories, the programmed computer does not do “information processing”. Rather, what it does is manipulate formal symbols. The fact that the programmer and the interpreter of the computer output use the symbols to stand for objects in the world is totally beyond the scope of the computer. The computer, to repeat, has a syntax but no semantics. Thus if you type into the computer “2 plus 2 equals?” it will type out “4”. But it has no idea that ‘4’ means 4, or that it means anything at all. And the point is not that it lacks some second-order information about the interpretation of its first-order symbols, but rather that its first-order symbols don’t have any interpretations as far as the computer is concerned. All the computer has is more symbols.
因此,引入“信息处理”这一概念会产生一个两难境地。我们要么将“信息处理”这一概念解释为暗示意向性是这一过程的一部分,要么不解释。如果是前者,那么编程计算机就不会进行信息处理,它只会操纵形式符号。如果是后者,那么尽管计算机会进行信息处理,但它只是在加法机、打字机、胃、恒温器、暴雨和飓风进行信息处理的意义上——即,在某种描述层面上,我们可以将它们描述为从一端接收信息,对其进行转换,然后产生信息作为输出。但在这种情况下,外部观察者必须将输入和输出解释为普通意义上的信息。在这两种情况下,计算机和大脑在信息处理方面没有任何相似之处。
The introduction of the notion of “information processing” therefore produces a dilemma. Either we construe the notion of “information processing” in such a way that it implies intentionality as part of the process, or we don’t. If the former, then the programmed computer does not do information processing, it only manipulates formal symbols. If the latter, then, although the computer does information processing, it is only in the sense in which adding machines, typewriters, stomachs, thermostats, rainstorms, and hurricanes do information processing—namely, in the sense that there is a level of description at which we can describe them as taking information in at one end, transforming it, and producing information as output. But in this case it is up to outside observers to interpret the input and output as information in the ordinary sense. And no similarity is established between the computer and the brain in terms of any similarity of information processing in either of the two cases.
其次,在很多人工智能中都残留着行为主义或操作主义。由于经过适当编程的计算机可以具有与人类相似的输入/输出模式,我们倾向于假设计算机中的心理状态与人类的心理状态相似。但是,一旦我们看到一个系统在概念上和经验上都可能在某些领域具有人类的能力而没有任何意向性,我们就应该能够克服这种冲动。我的台式加法机具有计算能力,但没有意向性;在本文中,我试图表明,一个系统可以具有与母语为中文的人相同的输入和输出能力,但仍然不懂中文,无论它是如何编程的。图灵测试是传统的典型,它毫无羞耻地坚持行为主义和操作主义,我相信,如果人工智能工作者完全否定行为主义和操作主义,模拟和复制之间的许多混淆就会消除。
Secondly, in much of AI there is a residual behaviorism or operationalism. Since appropriately programmed computers can have input/output patterns similar to human beings, we are tempted to postulate mental states in the computer similar to human mental states. But once we see that it is both conceptually and empirically possible for a system to have human capacities in some realm without having any intentionality at all, we should be able to overcome this impulse. My desk adding machine has calculating capacities but no intentionality; and in this paper I have tried to show that a system could have input and output capabilities which duplicated those of a native Chinese speaker and still not understand Chinese, regardless of how it was programmed. The Turing test is typical of the tradition in being unashamedly behavioristic and operationalistic, and I believe that if AI workers totally repudiated behaviorism and operationalism, much of the confusion between simulation and duplication would be eliminated.
第三,这种残余操作主义与二元论的残余形式相结合;事实上,强人工智能只有在二元论假设下才有意义,即在思想方面,大脑并不重要。在强人工智能中(以及在功能主义中),重要的是程序,程序与它们在机器中的实现无关;事实上,就人工智能而言,同一个程序可以由电子机器、笛卡尔精神实体或黑格尔世界精神来实现。在讨论这些问题时,我最令人惊讶的发现是,许多人工智能工作者对我的想法感到震惊,即实际的人类心理现象可能依赖于实际人类大脑的实际物理化学特性。但我不应该感到惊讶;因为除非你接受某种形式的二元论,否则强人工智能项目就没有机会。
Third, this residual operationalism is joined to a residual form of dualism; indeed, strong AI only makes sense given the dualistic assumption that where the mind is concerned the brain doesn’t matter. In strong AI (and in functionalism, as well) what matters are programs, and programs are independent of their realization in machines; indeed, as far as AI is concerned, the same program could be realized by an electronic machine, a Cartesian mental substance, or an Hegelian world spirit. The single most surprising discovery that I have made in discussing these issues is that many AI workers are shocked by my idea that actual human mental phenomena might be dependent on actual physical-chemical properties of actual human brains. But I should not have been surprised; for unless you accept some form of dualism, the strong-AI project hasn’t got a chance.
该项目旨在通过设计程序来重现和解释心理;但除非心灵不仅在概念上而且在经验上独立于大脑,否则你无法实施该项目,因为程序完全独立于任何实现。除非你相信心灵在概念和经验上都可以与大脑分离——强二元论——否则你就不能指望通过编写和运行程序来重现心理,因为程序必须独立于大脑或任何其他特定的实例形式。如果心理操作包括对形式符号的计算操作,那么它们与大脑没有任何有趣的联系,唯一的联系是大脑恰好是能够实例化程序的无数种机器之一。这种形式的二元论不是传统的笛卡尔二元论,它声称有两种物质,但它是笛卡尔式的,因为它坚持认为心灵的具体心理特征与大脑的实际属性没有内在联系。人工智能文献中经常对“二元论”进行猛烈抨击,因此我们无法看到这种潜在的二元论。作者似乎没有意识到,他们的立场预设了一种强二元论。
The project is to reproduce and explain the mental by designing programs; but unless the mind is not only conceptually but empirically independent of the brain, you cannot carry out the project, for the program is completely independent of any realization. Unless you believe that the mind is separable from the brain both conceptually and empirically—dualism in a strong form—you cannot hope to reproduce the mental by writing and running programs, since programs must be independent of brains or any other particular forms of instantiation. If mental operations consist of computational operations on formal symbols, it follows that they have no interesting connection with the brain, and the only connection would be that the brain just happens to be one of the indefinitely many types of machines capable of instantiating the program. This form of dualism is not the traditional Cartesian variety that claims there are two sorts of substances, but it is Cartesian in the sense that it insists that what is specifically mental about the mind has no intrinsic connection with the actual properties of the brain. This underlying dualism is masked from us by the fact that AI literature contains frequent fulminations against “dualism”. What the authors seem to be unaware of is that their position presupposes a strong version of dualism.
• 机器可以思考吗?
• Could a machine think?
我个人的观点是,只有机器才能思考,而且确实只有非常特殊的机器,即大脑和具有与大脑相同因果能力的机器才能思考。这就是强人工智能几乎无法告诉我们有关思考的事情的主要原因:它无法告诉我们有关机器的事情。根据其自身的定义,强人工智能与程序有关,而程序不是机器。无论意向性是什么,它都是一种生物现象,并且它很可能像哺乳、光合作用或任何生物现象一样,与其起源的特定生物化学有因果关系。没有人会认为我们可以通过运行计算机模拟哺乳和光合作用的形式序列来生产牛奶和糖;但就思想而言,许多人愿意相信这样的奇迹,因为存在着深刻而持久的二元论:他们认为,思想是形式过程的问题,与牛奶和糖不同,它独立于特定的物质原因。
My own view is that only a machine could think, and indeed only very special kinds of machines, namely brains and machines that had the same causal powers as brains. And that is the main reason why strong AI has had little to tell us about thinking: it has nothing to tell us about machines. By its own definition it is about programs, and programs are not machines. Whatever else intentionality is, it is a biological phenomenon, and it is likely to be as causally dependent on the specific biochemistry of its origins as are lactation, photosynthesis, or any biological phenomena. No one would suppose that we could produce milk and sugar by running a computer simulation of the formal sequences in lactation and photosynthesis; but where the mind is concerned, many people are willing to believe in such a miracle, because of a deep and abiding dualism: the mind, they suppose, is a matter of formal processes and is independent of specific material causes in a way that milk and sugar are not.
为了捍卫这种二元论,人们常常希望大脑是一台数字计算机。(顺便说一句,早期的计算机通常被称为“电子大脑”)。但这无济于事。大脑当然是一台数字计算机。既然一切都是数字计算机,大脑也是如此。关键在于,大脑产生意向性的因果能力不能在于它实例化一个计算机程序,因为对于任何程序,都有可能实例化该程序而仍然没有任何心理状态。无论大脑为产生意向性做了什么,它都不能在于实例化一个程序,因为没有一个程序本身足以产生意向性。
In defense of this dualism, the hope is often expressed that the brain is a digital computer. (Early computers, by the way, were often called “electronic brains”.) But that is no help. Of course the brain is a digital computer. Since everything is a digital computer, brains are too. The point is that the brain’s causal capacity to produce intentionality cannot consist in its instantiating a computer program, since for any program you like it is possible for something to instantiate that program and still not have any mental states. Whatever it is that the brain does to produce intentionality, it cannot consist in instantiating a program, since no program by itself is sufficient for intentionality.
1.当然,我并不是说 Schank 本人也坚持这些主张。
1. I am not saying, of course, that Schank himself is committed to these claims.
2.此外,“理解”既意味着拥有心理(意向)状态,也意味着这些状态的真实性(有效性、成功性)。为了讨论的目的,我们只关注这些状态的拥有。
2. Also, “understanding” implies both the possession of mental (intentional) states and the truth (validity, success) of these states. For the purposes of this discussion, we are concerned only with the possession of the states.
3.意向性的定义是某些心理状态的特征,这些心理状态针对或涉及世界上的物体和事态。因此,信念、欲望和意图是意向性状态;无指向的焦虑和抑郁则不是。(有关进一步讨论,请参阅 Searle,1979 年)。
3. Intentionality is by definition that feature of certain mental states by which they are directed at or are about objects and states of affairs in the world. Thus, beliefs, desires, and intentions are intentional states; undirected forms of anxiety and depression are not. (For further discussion, see Searle, 1979).
玛格丽特·博登
Margaret Boden
1988
1988
约翰·塞尔 (John Searle) 在其论文《思维、大脑和程序》(1980 年;本卷第 12 章) 中指出,心理学中的计算理论本质上毫无价值。他提出了两个主要主张:计算理论本质上是纯粹形式化的,不可能帮助我们理解心理过程;计算机硬件与神经蛋白不同,显然缺乏产生心理过程的正确因果能力。我认为这两种说法都是错误的。
John Searle, in his paper on ‘Minds, Brains, and Programs’ (1980; Chapter 12 in this volume), argues that computational theories in psychology are essentially worthless. He makes two main claims: that computational theories, being purely formal in nature, cannot possibly help us to understand mental processes; and that computer hardware—unlike neuroprotein—obviously lacks the right causal powers to generate mental processes. I shall argue that both these claims are mistaken.
他的第一个主张理所当然地接受了广泛持有的(形式主义)假设,即计算机科学中研究的“计算”纯粹是句法的,它们可以被定义为(以同样适用于符号逻辑的术语)通过应用形式规则对抽象符号进行形式化操作。因此,他说,形式主义的解释——适用于解释计算机中毫无意义的“信息”处理或“符号”操作——无法解释人类思维如何正确地使用所谓的信息或符号。意义或意向性不能用计算术语来解释。
His first claim takes for granted the widely-held (formalist) assumption that the ‘computations’ studied in computer science are purely syntactic, that they can be defined (in terms equally suited to symbolic logic) as the formal manipulation of abstract symbols, by the application of formal rules. It follows, he says, that formalist accounts—appropriate in explaining the meaningless ‘information’-processing or ‘symbol’-manipulations in computers—are unable to explain how human minds employ information or symbols properly so-called. Meaning, or intentionality, cannot be explained in computational terms.
塞尔在这里的观点并不是说没有机器能够思考。人类可以思考,而且他承认人类是机器;他甚至采纳了唯物主义的信条,即只有机器可以思考。他也没有说人类和程序完全无法比较。他承认,在某种高度抽象的描述层面上,人(和其他一切事物一样)是数字计算机的实例。相反,他的观点是,没有任何东西可以仅仅通过实例化计算机程序来思考、理解或理解。
Searle’s point here is not that no machine can think. Humans can think, and humans—he allows—are machines; he even adopts the materialist credo that only machines can think. Nor is he saying that humans and programs are utterly incommensurable. He grants that, at some highly abstract level of description, people (like everything else) are instantiations of digital computers. His point, rather, is that nothing can think, mean, or understand solely in virtue of its instantiating a computer program.
为了让我们相信这一点,塞尔采用了一个巧妙的思维实验。他想象自己被锁在一个房间里,房间里有各种画有涂鸦的纸条;房间里有一扇窗户,人们可以通过这扇窗户把更多的涂鸦纸递给他,他也可以通过这扇窗户把纸分发出去;房间里还有一本规则书(英文版),告诉他如何将涂鸦配对,这些涂鸦总是通过它们的形状或形式来识别。塞尔在房间里花时间按照规则操纵涂鸦。
To persuade us of this, Searle employs an ingenious thought-experiment. He imagines himself locked in a room, in which there are various slips of paper with doodles on them; a window through which people can pass further doodle-papers to him, and through which he can pass papers out; and a book of rules (in English) telling him how to pair the doodles, which are always identified by their shape or form. Searle spends his time, while inside the room, manipulating the doodles according to the rules.
例如,有一条规则指示他,当有人将squiggle-squiggle传递给他时,他应该发出squoggle-squoggle。规则手册还提供了更复杂的涂鸦配对序列,其中只有第一步和最后一步提到将纸条传入或传出房间。在找到任何直接指示他发出纸条的规则之前,他可能必须找到一个blongle涂鸦并将其与blungle涂鸦进行比较 - 在这种情况下,这种比较的结果决定了他发出的涂鸦的性质。有时,他必须在房间内进行许多这样的涂鸦与涂鸦比较以及随之而来的涂鸦选择,然后才能找到允许他发出任何东西的规则。
One rule, for example, instructs him that when squiggle-squiggle is passed in to him, he should give out squoggle-squoggle. The rule-book also provides for more complex sequences of doodle-pairing, where only the first and last steps mention the transfer of paper into or out of the room. Before finding any rule directly instructing him to give out a slip of paper, he may have to locate a blongle doodle and compare it with a blungle doodle—in which case, it is the result of this comparison which determines the nature of the doodle he passes out. Sometimes many such doodle-doodle comparisons and consequent doodle-selections have to be made by him inside the room before he finds a rule allowing him to pass anything out.
对于房间里的 Searle 来说,这些潦草的字迹和粗略的线条只是毫无意义的涂鸦。但他不知道,它们是汉字。房间外面的人是中国人,所以会把它们理解为汉字。此外,他们还将窗外进出的图案理解为问题和答案:规则恰好是这样的,大多数问题都直接或间接地与他们认为合理的答案配对。但 Searle 本人(在房间里)对此一无所知。
So far as Searle-in-the-room is concerned, the squiggles and squoggles are mere meaningless doodles. Unknown to him, however, they are Chinese characters. The people outside the room, being Chinese, interpret them as such. Moreover, the patterns passed in and out at the window are understood by them as questions and answers respectively: the rules happen to be such that most of the questions are paired, either directly or indirectly, with what they recognize as a sensible answer. But Searle himself (inside the room) knows nothing of this.
塞尔说,重点在于,房间里的塞尔显然是在实例化一个计算机程序。也就是说,他正在对未经解释的模式进行纯粹的形式化操作:他只讲语法,不讲语义。
The point, says Searle, is that Searle-in-the-room is clearly instantiating a computer program. That is, he is performing purely formal manipulations of uninterpreted patterns: he is all syntax and no semantics.
涂鸦配对规则等同于专家系统中常用的 IF-THEN 规则或“产生式”。一些内部涂鸦比较可能等同于自然语言处理中的人工智能工作者所称的脚本——例如,RC Schank 和 RP Abelson (1977b) 描述的餐厅脚本。在这种情况下,Searle-in-the-room 的纸张传递性能基本上可以与“问答”Schankian 文本分析程序的性能相媲美。但“问答”不是问答。Searle-in-the-room 并没有真正回答:他怎么能回答,因为他无法理解问题?练习没有帮助(除了可能使涂鸦配对更快):如果 Searle-in-the-room 逃脱,他将像刚被锁在里面时一样不懂中文。
The doodle-pairing rules are equivalent to the IF-THEN rules, or ‘productions’, commonly used (for example) in expert systems. Some of the internal doodle-comparisons could be equivalent to what AI workers in natural-language processing call a script—for instance, the restaurant script described by R. C. Schank and R. P. Abelson (1977b). In that case, Searle-in-the-room’s paper-passing performance would be essentially comparable to the performance of a ‘question-answering’ Schankian text-analysis program. But ‘question-answering’ is not question-answering. Searle-in-the-room is not really answering: how could he, since he cannot understand the questions? Practice does not help (except perhaps in making the doodle-pairing swifter): if Searle-in-the-room ever escapes, he will be just as ignorant of Chinese as he was when he was first locked in.
当然,外面的中国人可能会发现,让房间里的塞尔吃饱喝足是有用的,就像在现实生活中,我们愿意花大笔钱购买计算机化的“建议”系统一样。但事实上,已经具备理解能力的人可能会使用本质上毫无意义的形式主义计算系统来提供他们所解释的问题、答案、名称、解释或符号,但这无关紧要。他们只有在能够从外部指定形式主义与他们感兴趣的事物之间的映射时才能做到这一点。原则上,同一种形式主义可能可以映射到几个不同的领域,因此人们可以使用它来回答有关这些领域的任何问题。然而,它本身是没有意义的——从房间里的塞尔的角度来看,中文符号也是没有意义的。
Certainly, the Chinese people outside might find it useful to keep Searle-in-the-room fed and watered, much as in real life we are willing to spend large sums of money on computerized ‘advice’ systems. But the fact that people who already possess understanding may use an intrinsically meaningless formalist computational system to provide what they interpret (sic) as questions, answers, designations, interpretations, or symbols is irrelevant. They can do this only if they can externally specify a mapping between the formalism and matters of interest to them. In principle, one and the same formalism might be mappable onto several different domains, so could be used (by people) in answering questions about any of those domains. In itself, however, it would be meaningless—as are the Chinese symbols from the point of view of Searle-in-the-room.
因此,Searle 认为,没有哪个系统能够仅凭其实例化计算机程序来理解任何事物。因为如果它能理解,那么 Searle-in-the-room 就能理解中文。因此,理论心理学不能正确地以计算概念为基础。
It follows, Searle argues, that no system can understand anything solely in virtue of its instantiating a computer program. For if it could, then Searle-in-the-room would understand Chinese. Hence, theoretical psychology cannot properly be grounded in computational concepts.
塞尔的第二个主张涉及对理解的正确解释。据他所说,它承认有意义的符号必须体现在具有“正确的因果能力”的东西中,才能产生理解或意向性。显然,他说,大脑确实具有这种因果能力,而计算机则没有。更准确地说(因为大脑的组织可以在计算机中并行),神经蛋白具有这种因果能力,而金属和硅则没有:大脑物质的生化特性至关重要。
Searle’s second claim concerns what a proper explanation of understanding would be like. According to him, it would acknowledge that meaningful symbols must be embodied in something having ‘the right causal powers’ for generating understanding, or intentionality. Obviously, he says, brains do have such causal powers whereas computers do not. More precisely (since the brain’s organization could be paralleled in a computer), neuroprotein does whereas metal and silicon do not: the biochemical properties of the brain matter are crucial.
塞尔拒绝了 A. Newell (1980;另见本卷第 3 章) 广为引用的“物理符号系统”定义,因为它仅仅要求符号体现在可以实现形式主义计算的某种材料中——诚然,计算机可以做到这一点。在塞尔看来,没有电子计算机能够真正操纵符号,也不能真正指定或解释任何东西——无论其内部物理模式与其行为之间是否存在因果关系。(这种强烈的现实主义意向性观点与 DC Dennett (1971;另见本卷第 11 章) 的工具主义形成鲜明对比。对于 Dennett 来说,意向系统是一种我们只能通过将信念、目标和理性归因于它来解释、预测和控制其行为的系统。根据这一标准,一些现有的计算机程序是意向系统,而科幻小说中深受喜爱的假想人形机器人更是意向系统。)
A. Newell’s (1980; see also Chapter 3 of this volume) widely cited definition of ‘physical-symbol systems’ is rejected by Searle, because it demands merely that symbols be embodied in some material that can implement formalist computations—which computers, admittedly, can do. In Searle’s view, no electronic computer can really manipulate symbols, nor really designate or interpret anything at all—irrespective of any causal dependencies linking its internal physical patterns to its behaviour. (This strongly realist view of intentionality contrasts with the instrumentalism of D. C. Dennett (1971; see also Chapter 11 of this volume). For Dennett, an intentional system is one whose behaviour we can explain, predict, and control only by ascribing beliefs, goals, and rationality to it. On this criterion, some existing computer programs are intentional systems, and the hypothetical humanoids beloved of science-fiction would be intentional systems a fortiori.)
塞尔宣称,意向性是一种生物现象。因此,它与光合作用和哺乳一样,依赖于潜在的生物化学。他承认,神经蛋白可能不是宇宙中唯一能够支持精神生活的物质,就像叶绿素以外的物质可能(也许在火星上)能够催化碳水化合物的合成一样。但他拒绝将金属或硅作为潜在的替代品,即使在火星上也是如此。他问用旧啤酒罐制成的计算机是否可能理解——这是一个反问句,预期的答案是响亮的“不!”简而言之,塞尔认为,直观地看,制造(当今)计算机的无机物质基本上无法支持心理功能。
Intentionality, Searle declares, is a biological phenomenon. As such, it is just as dependent on the underlying biochemistry as are photosynthesis and lactation. He grants that neuroprotein may not be the only substances in the universe capable of supporting mental life, much as substances other than chlorophyll may be able (on Mars, perhaps) to catalyse the synthesis of carbohydrates. But he rejects metal or silicon as potential alternatives, even on Mars. He asks whether a computer made out of old beer-cans could possibly understand—a rhetorical question to which the expected answer is a resounding ‘No!’ In short, Searle takes it to be intuitively obvious that the inorganic substances with which (today’s) computers are manufactured are essentially incapable of supporting mental functions.
在评估 Searle 对计算心理学的双管齐下的批评时,让我们首先考虑一下他的观点,即意向性必须有生物学基础。人们可能倾向于将此称为积极主张,与他(消极)的主张(纯形式主义理论无法解释心理)形成鲜明对比。然而,这将给予它过多的认可,因为它的解释力是虚幻的。Searle 提到的生物学类比具有误导性,他所诉诸的直觉是不可靠的。
In assessing Searle’s two-pronged critique of computational psychology, let us first consider his view that intentionality must be biologically grounded. One might be tempted to call this a positive claim, in contrast with his (negative) claim that purely formalist theories cannot explain mentality. However, this would be to grant it more than it deserves, for its explanatory power is illusory. The biological analogies mentioned by Searle are misleading, and the intuitions to which he appeals are unreliable.
据说,大脑产生意向性的过程可与光合作用相媲美——但事实真是如此吗?我们可以定义光合作用的产物,明确区分碳水化合物类别中的各种糖和淀粉,并说明它们与其他生化产物(如蛋白质)的区别。此外,我们不仅知道叶绿素支持光合作用,还了解它是如何做到这一点的(以及为什么其他各种化学物质不能做到这一点)。我们知道它是一种催化剂,而不是原材料;我们可以指定其催化功能的发挥点和亚原子过程。至于大脑和理解力,情况就大不相同了。
The brain’s production of intentionality, we are told, is comparable to photosynthesis—but is it, really? We can define the products of photosynthesis, clearly distinguishing various sugars and starches within the general class of carbohydrates, and showing how these differ from other biochemical products such as proteins. Moreover, we not only know that chlorophyll supports photosynthesis, we also understand how it does so (and why various other chemicals cannot). We know that it is a catalyst rather than a raw material; and we can specify the point at which, and the subatomic process by which, its catalytic function is exercised. With respect to brains and understanding, the case is very different.
我们关于意向性是什么的理论(且不论它是如何产生的)与我们对碳水化合物的认识是无法比拟的:意向性到底是什么在哲学上仍存在争议。我们甚至不能完全确信看到它时就能认出它。人们普遍认为命题态度是有意的,而感觉和知觉不是;但对于情绪的意向性,尚无明确的共识。
Our theory of what intentionality is (never mind how it is generated) does not bear comparison with our knowledge of carbohydrates: just what intentionality is is still philosophically controversial. We cannot even be entirely confident that we can recognize it when we see it. It is generally agreed that the propositional attitudes are intentional, and that feelings and sensations are not; but there is no clear consensus about the intentionality of emotions.
人们曾尝试过各种方法来描述意向性,并将其亚种区分为不同的意向性状态(信念、欲望、希望、意图等)。塞尔本人也做出了许多相关贡献,从他早期关于言语行为的研究(1969 年)到他最近关于一般意向性的论述(1983 年)。一个常用的标准(由布伦塔诺在 19 世纪采用,塞尔也采用)是心理学标准。用布伦塔诺的话来说,意向性状态将思想引向一个对象;用塞尔的话来说,它们具有内在的表征能力或“关于性”;无论哪种情况,它们都将思想与世界和可能世界联系起来。但有些作家用逻辑术语来定义意向性(Chisholm,1967 年)。甚至不清楚逻辑和心理学定义是否完全同延(Boden,1970 年)。简而言之,没有一种意向性理论被认为是没有问题的,就像碳水化合物的化学性质一样。
Various attempts have been made to characterize intentionality and to distinguish its subspecies as distinct intentional states (beliefs, desires, hopes, intentions, and the like). Searle himself has made a number of relevant contributions, from his early work on speech-acts (1969) to his more recent account (1983) of intentionality in general. A commonly used criterion (adopted by Brentano in the nineteenth century and also by Searle) is a psychological one. In Brentano’s words, intentional states direct the mind on an object; in Searle’s, they have intrinsic representational capacity, or ‘aboutness’; in either case they relate the mind to the world, and to possible worlds. But some writers define intentionality in logical terms (Chisholm, 1967). It is not even clear whether the logical and psychological definitions are precisely co-extensive (Boden, 1970). In brief, no theory of intentionality is accepted as unproblematic, as the chemistry of carbohydrates is.
至于大脑对意向性的生化“合成”,这更是神秘莫测。我们有充分的理由相信神经蛋白支持意向性,但我们几乎不知道它作为神经蛋白如何能够做到这一点。
As for the brain’s biochemical ‘synthesis’ of intentionality, this is even more mysterious. We have very good reason to believe that neuroprotein supports intentionality, but we have hardly any idea how—qua neuroprotein—it is able to do so.
就我们对这些问题的理解而言,我们关注的是神经元和突触中体现的某些信息功能的神经化学基础,例如信息传递、促进和抑制。例如:细胞膜上的钠泵如何使动作电位沿轴突传播;电化学变化如何导致神经元进入不应期并从中恢复;或者神经元阈值如何被神经递质(如乙酰胆碱)改变。
In so far as we understand these matters at all, we focus on the neurochemical basis of certain informational functions—such as message-passing, facilitation, and inhibition—embodied in neurones and synapses. For example: how the sodium-pump at the cell-membrane enables an action potential to propagate along the axon; how electrochemical changes cause a neurone to enter into and recover from its refractory period; or how neuronal thresholds can be altered by neurotransmitters, such as acetylcholine.
例如,对于视觉细胞而言,一个关键的心理学问题可能是它是否能够发挥检测强度梯度的功能。如果神经生理学家能够告诉我们哪些分子能够使它做到这一点,那就更好了。但从心理学的角度来看,重要的不是生物化学本身,而是以它为基础的信息承载功能。(Searle 显然承认了这一点,他说:“大脑中意向状态的实现类型可能在比所涉及的神经元的特定生物化学更高的功能水平上可以描述”(1983,272)。)
With respect to a visual cell, for instance, a crucial psychological question may be whether it can function so as to detect intensity-gradients. If the neurophysiologist can tell us which molecules enable it to do so, so much the better. But from the psychological point of view, it is not the biochemistry as such which matters but the information-bearing functions grounded in it. (Searle apparently admits this when he says, ‘The type of realizations that intentional states have in the brain may be describable at a much higher functional level than that of the specific biochemistry of the neurons involved’ (1983, 272).)
正如“计算机视觉”领域的研究显示,金属和硅无疑能够支持视觉中二维到三维映射所必需的一些功能。此外,它们还可以体现用于识别强度梯度的特定数学函数(即“DOG 检测器”,用于计算高斯差),而这似乎涉及许多生物视觉系统。诚然,金属和硅可能无法支持正常视觉或一般理解所涉及的所有功能。也许只有神经蛋白才能做到这一点,因此只有具有“陆地”生物学的生物才能享受意向性。但目前我们还没有具体的理由这么认为。在这种情况下,最重要的是,我们将来可能拥有的任何此类理由都必须以经验发现为基础:直觉无济于事。
As work in ‘computer vision’ has shown, metal and silicon are undoubtedly able to support some of the functions necessary for the 2D-to-3D mapping involved in vision. Moreover, they can embody specific mathematical functions for recognizing intensity-gradients (namely ‘DOG-detectors’, which compute the difference of Gaussians) which seem to be involved in many biological visual systems. Admittedly, it may be that metal and silicon cannot support all the functions involved in normal vision, or in understanding generally. Perhaps only neuroprotein can do so, so that only creatures with a ‘terrestrial’ biology can enjoy intentionality. But we have no specific reason, at present, to think so. Most important in this context, any such reasons we might have in the future must be grounded in empirical discovery: intuitions will not help.
如果有人问,哪些心智-物质依赖关系在直觉上是合理的,答案一定是没有。对意向性(与动作电位相反)感到困惑的人从来没有惊呼“钠——当然!”钠泵的荒谬程度不亚于硅片,电极性的无关紧要程度不亚于旧啤酒罐,乙酰胆碱的令人惊讶程度不亚于啤酒。这三对中的第一个成员在科学上令人信服,但这并不意味着它们中的任何一个在直觉上都是可以理解的:我们最初的惊讶仍然存在。
If one asks which mind-matter dependencies are intuitively plausible, the answer must be that none is. Nobody who was puzzled about intentionality (as opposed to action-potentials) ever exclaimed ‘Sodium—of course!’ Sodium-pumps are no less ‘obviously’ absurd than silicon chips, electrical polarities no less ‘obviously’ irrelevant than old beer-cans, acetylcholine hardly less surprising than beer. The fact that the first member of each of these three pairs is scientifically compelling does not make any of them intuitively intelligible: our initial surprise persists.
我们的直觉可能会随着科学的进步而改变。也许我们最终会发现神经蛋白(也许还有硅)显然能够体现思想,就像我们现在看到一般的生化物质(包括叶绿素)显然能够产生其他此类物质一样——在尿素合成之前,这种直觉甚至对化学家来说都不是显而易见的。然而,目前我们的直觉对于意向性的物质基础没有任何有用的说明。塞尔的“积极”主张,他对意向性的假定替代解释,充其量只是一张期票,最坏的情况只是一种神秘的宣传。
Our intuitions might change with the advance of science. Possibly we shall eventually see neuroprotein (and perhaps silicon too) as obviously capable of embodying mind, much as we now see biochemical substances in general (including chlorophyll) as obviously capable of producing other such substances—an intuition that was not obvious, even to chemists, prior to the synthesis of urea. At present, however, our intuitions have nothing useful to say about the material basis of intentionality. Searle’s ‘positive’ claim, his putative alternative explanation of intentionality, is at best a promissory note, at worst mere mystery-mongering.
塞尔的负面主张——形式计算理论无法解释理解——没有那么快被驳斥。我的反驳将涉及两个部分:第一部分直接针对他的中文房间例子,第二部分针对他的背景假设(他的例子依赖于此),即计算机程序是纯语法的。
Searle’s negative claim—that formal-computational theories cannot explain understanding—is less quickly rebutted. My rebuttal will involve two parts: the first directly addressing his example of the Chinese room, the second dealing with his background assumption (on which his example depends) that computer programs are pure syntax.
中文房间的例子在认知科学界内外都引起了很多争论。Searle 本人在他的原始论文中预见到了一些批评,其他批评则出现在同行评议中(连同他的回复),此后又发表了更多批评。在这里,我将只关注两点:Searle 所说的机器人回复,以及我所说的英语回复。
The Chinese-room example has engendered much debate, both within and outside the community of cognitive science. Some criticisms were anticipated by Searle himself in his original paper, others appeared as the accompanying peer-commentary (together with his Reply), and more have been published since. Here, I shall concentrate on only two points: what Searle calls the Robot reply, and what I shall call the English reply.
机器人的回答承认,在 Searle 的例子中,对中文的唯一理解就是房间外的中国人所理解的。在房间里的 Searle 无法将汉字与外界事件联系起来,这表明他不懂中文。同样,一台无法识别餐馆、无法将钱交给服务员或无法咀嚼食物的 Schankian 电传打字机对餐馆一无所知——即使它可以有效地“回答”我们关于餐馆的问题。但是,如果一个机器人不仅配备了餐馆脚本,还配备了摄像头提供的视觉程序和能够行走和捡起东西的四肢,那就另当别论了。如果这种机器人的输入输出行为与人类相同,那么它显然会理解餐馆和人们用来与其交流的自然语言——也许是中文。
The Robot reply accepts that the only understanding of Chinese which exists in Searle’s example is that enjoyed by the Chinese people outside the room. Searle-in-the-room’s inability to connect Chinese characters with events in the outside world shows that he does not understand Chinese. Likewise, a Schankian teletyping computer that cannot recognize a restaurant, hand money to a waiter, or chew a morsel of food understands nothing of restaurants—even if it can usefully ‘answer’ our questions about them. But a robot, provided not only with a restaurant-script but also with camera-fed visual programs and limbs capable of walking and picking things up, would be another matter. If the input-output behaviour of such a robot were identical with that of human beings, then it would demonstrably understand both restaurants and the natural language—Chinese, perhaps—used by people to communicate with it.
塞尔对机器人的回答的第一个回应是宣称自己已经取得了胜利,因为这个回答承认认知不仅仅是形式符号操纵的问题,还需要一组与外部世界的因果关系。其次,塞尔坚持认为,在计算系统中添加感知运动能力并不意味着添加意向性或理解力。
Searle’s first response to the Robot reply is to claim a victory already, since the reply concedes that cognition is not solely a matter of formal symbol-manipulation but requires in addition a set of causal relations with the outside world. Second, Searle insists that to add perceptuomotor capacities to a computational system is not to add intentionality, or understanding.
他通过想象一个机器人来论证这一点,这个机器人没有安装计算机程序来工作,而是在它的头骨里装了一个微型的塞尔。机器人里的塞尔在(新)规则手册的帮助下,翻阅文件,把潦草的字迹传来传去,就像他之前的房间里的塞尔所做的那样。但现在,部分或全部输入的汉字不是由中国人提交的,而是由机器人眼睛和耳朵里的摄像头和音频设备中的因果过程触发的。输出的汉字不是由中国人的手接收的,而是由机器人四肢上的马达和杠杆接收的,从而引起它们移动。简而言之,这个机器人显然不仅能够用中文回答问题,而且还能看清事物并据此采取行动:它可以识别生豆芽,如果菜谱需要,它可以像我们其他人一样把它们扔进锅里。
He argues this point by imagining a robot which, instead of being provided with a computer program to make it work, has a miniaturized Searle inside it—in its skull, perhaps. Searle-in-the-robot, with the aid of a (new) rule-book, shuffles paper and passes squiggles and squoggles in and out, much as Searle-in-the-room did before him. But now some or all of the incoming Chinese characters are not handed in by Chinese people, but are triggered by causal processes in the cameras and audio-equipment in the robot’s eyes and ears. And the outgoing Chinese characters are not received by Chinese hands, but by motors and levers attached to the robot’s limbs—which are caused to move as a result. In short, this robot is apparently able not only to answer questions in Chinese, but also to see and do things accordingly: it can recognize raw beansprouts and, if the recipe requires it, toss them into a wok as well as the rest of us.
(上面提到的计算机视觉研究表明,中文词汇量需要大量扩展才能实现这个例子。而大量关于语言处理的人工智能研究表明,表达 Searle 最初的“问答”例子中的规则所需的英语也是如此。无论如何,房间里的 Searle 需要的不是中文,甚至不是英语,而是一种编程语言。我们稍后将回到这一点。)
(The work on computer vision mentioned above suggests that the vocabulary of Chinese would require considerable extension for this example to be carried through. And the large body of AI research on language-processing suggests that the same could be said of the English required to express the rules in Searle’s initial ‘question-answering’ example. In either case, what Searle-in-the-room needs is not so much Chinese, or even English, as a programming-language. We shall return to this point presently.)
然而,就像他的前任一样,机器人中的塞尔对更广泛的背景一无所知。他和以前一样不懂中文,对外部世界的了解也和原先一样少。对他来说,豆芽和炒锅是看不见、摸不着的:除了规则手册和涂鸦,机器人中的塞尔能看到和触摸的只有他自己的身体和机器人头骨的内壁。因此,塞尔认为,机器人不能理解任何这些世俗的事情。事实上,它什么也没看到,什么也没做:它“只是在电线和程序的作用下移动”,后者是由机器人内部的人实例化的,而人“没有相关类型的意向状态”(1980,420)。
Like his roombound predecessor, however, Searle-in-the-robot knows nothing of the wider context. He is just as ignorant of Chinese as he ever was, and has no more purchase on the outside world than he did in the original example. To him, beansprouts and woks are invisible and intangible: all Searle-in-the-robot can see and touch, besides the rule-book and the doodles, are his own body and the inside walls of the robot’s skull. Consequently, Searle argues, the robot cannot be credited with understanding of any of these worldly matters. In truth, it is not seeing or doing anything at all: it is ‘simply moving about as a result of its electrical wiring and its program’, which latter is instantiated by the man inside it, who ‘has no intentional states of the relevant type’ (1980, 420).
在这里,Searle 的论点作为对机器人答复的反驳是不可接受的,因为它在想象的例子和计算心理学的主张之间进行了错误的类比。
Searle’s argument here is unacceptable as a rebuttal of the Robot reply, because it draws a false analogy between the imagined example and what is claimed by computational psychology.
塞尔认为机器人中的塞尔正在执行人类大脑执行的功能(根据计算理论)。但是,尽管大多数计算学家并不认为大脑具有意向性(而那些认为大脑具有意向性的人,正如我们即将看到的,也只是在非常有限的方式下这样做),塞尔认为机器人中的塞尔就像他自己一样,具有完全的意向性。计算心理学并不认为看到豆芽或理解英语是大脑的功劳:这些意向状态是人的属性,而不是大脑的属性。一般来说,尽管计算学家和塞尔都认为表征和心理过程体现在大脑中,但它们所实现的感觉运动能力和命题态度却被归因于整个人。因此,塞尔将机器人头骨内的系统描述为可以理解英语的系统,这与计算学家对大脑的说法并不完全一致。
Searle-in-the-robot is supposed by Searle to be performing the functions performed (according to computational theories) by the human brain. But, whereas most computationalists do not ascribe intentionality to the brain (and those who do, as we shall see presently, do so only in a very limited way), Searle characterizes Searle-in-the-robot as enjoying full-blooded intentionality, just as he does himself. Computational psychology does not credit the brain with seeing beansprouts or understanding English: intentional states such as these are properties of people, not of brains. In general, although representations and mental processes are assumed (by computationalists and Searle alike) to be embodied in the brain, the sensorimotor capacities and propositional attitudes which they make possible are ascribed to the person as a whole. So Searle’s description of the system inside the robot’s skull as one which can understand English does not truly parallel what computationalists say about the brain.
事实上,计算心理学家假设的具体程序,以及他们在计算机思维模型中体现的具体程序,是相对愚蠢的——而且随着理论水平的提高,它们变得越来越愚蠢。例如,考虑自然语言解析理论。搜索限定词的解析程序不理解英语,定位人称代词指称的程序也不理解英语:只有大脑执行这些解释过程的人以及与之相关的许多其他人才能做到这一点。理解英语的能力涉及一系列相互作用的信息过程,每个过程只执行非常有限的功能,但它们共同提供了将英语句子作为输入并给出适当的英语句子作为输出的能力。类似的评论适用于视觉、解决问题或学习的计算理论的各个组成部分。正是因为心理学家希望解释人类语言、视觉、推理和学习,他们假设潜在的过程缺乏这些能力。
Indeed, the specific procedures hypothesized by computational psychologists, and embodied by them in computer models of the mind, are relatively stupid—and they become more and more stupid as one moves to increasingly basic theoretical levels. Consider theories of natural-language parsing, for example. A parsing procedure that searches for a determiner does not understand English, and nor does a procedure for locating the reference of a personal pronoun: only the person whose brain performs these interpretive processes, and many others associated with them, can do that. The capacity to understand English involves a host of interacting information processes, each of which performs only a very limited function but which together provide the capacity to take English sentences as input and give appropriate English sentences as output. Similar remarks apply to the individual components of computational theories of vision, problem-solving, or learning. Precisely because psychologists wish to explain human language, vision, reasoning, and learning, they posit underlying processes which lack the capacities.
简而言之,Searle 将机器人的伪大脑(即机器人中的 Searle)描述为理解英语,这涉及一个分类错误,类似于将大脑视为智力的承载者(而不是因果基础)。
In short, Searle’s description of the robot’s pseudo-brain (that is, of Searle-in-the-robot) as understanding English involves a category-mistake comparable to treating the brain as the bearer—as opposed to the causal basis—of intelligence.
有人可能会反对我,说我自相矛盾,说我声称不能将意向性归因于大脑,但我却暗中这么做了。因为我说的是大脑影响“愚蠢”的组成部分——但愚蠢实际上是一种智力。愚蠢就是聪明,但不是十分聪明(一个人或一条鱼可以愚蠢,但一块石头或一条河流不能)。
Someone might object here that I have contradicted myself, that I am claiming that one cannot ascribe intentionality to brains and yet am implicitly doing just that. For I spoke of the brain’s effecting ‘stupid’ component-procedures—but stupidity is virtually a species of intelligence. To be stupid is to be intelligent, but not very (a person or a fish can be stupid, but a stone or a river cannot).
我的辩护有两方面。首先,最基本的理论水平应该是神经科学中机器代码的等价物,这是进化“设计”出来的水平。事实上,某种感光细胞可以作为DOG 检测器以及一个神经元可以抑制另一个神经元的放电,这些都可以通过大脑的生物化学来解释。在讨论这些事实时,愚蠢的概念,即使是用引号引起来的,也是完全不合适的。然而,这些非常基本的信息处理功能(DOG 检测和突触抑制)可以恰当地描述为“非常、非常、非常……愚蠢”。这当然意味着,意图语言,即使只是非常勉强和不恭维的类型,毕竟适用于大脑过程——这促使我辩护的第二点。我并不是说意向性不能归因于大脑,而是说真正的意向性不能。我也没有说大脑根本无法理解任何东西,无论以多么有限的方式,而是说它们无法(例如)理解英语。我在几段之前甚至暗示过,一些计算主义者确实将某种程度的意向性归因于大脑(或大脑中进行的计算过程)。在我们考虑了英文的答复及其与 Searle 的背景假设(即形式句法计算理论是纯粹句法的)的关系之后,这两点就不会那么模糊了。
My defence would be twofold. First, the most basic theoretical level of all would be at the neuroscientific equivalent of the machine-code, a level ‘engineered’ by evolution. The facts that a certain light-sensitive cell can respond to intensity-gradients by acting as a DOG-detector and that one neurone can inhibit the firing of another, are explicable by the biochemistry of the brain. The notion of stupidity, even in scare-quotes, is wholly inappropriate in discussing such facts. However, these very basic information-processing functions (DOG-detecting and synaptic inhibition) could properly be described as ‘very, very, very…stupid’. This of course implies that intentional language, if only of a highly grudging and uncomplimentary type, is applicable to brain processes after all—which prompts the second point in my defence. I did not say that intentionality cannot be ascribed to brains, but that full-blooded intentionality cannot. Nor did I say that brains cannot understand anything at all, in howsoever limited a fashion, but that they cannot (for example) understand English. I even hinted, several paragraphs ago, that a few computationalists do ascribe some degree of intentionality to the brain (or to the computational processes going on in the brain). These two points will be less obscure after we have considered the English reply and its bearing on Searle’s background assumption that formal-syntactic computational theories are purely syntactic.
英国人回答的关键是,计算机程序的实例化,无论是由人还是由制造的机器编写的,都涉及理解——至少是理解规则手册。Searle 的最初例子在很大程度上取决于 Searle-in-the-room 能否理解规则所用的语言,即英语;同样,如果 Searle-in-the-robot 不熟悉英语,机器人的豆芽就永远不会被扔进锅里。此外,如上所述,英语词汇(对于 Searle-in-the-robot 来说,中文词汇也是如此)必须进行重大修改才能使该示例发挥作用。
The crux of the English reply is that the instantiation of a computer program, whether by man or by manufactured machine, does involve understanding—at least of the rule-book. Searle’s initial example depends critically on Searle-in-the-room’s being able to understand the language in which the rules are written, namely English; similarly, without Searle-in-the-robot’s familiarity with English, the robot’s beansprouts would never get thrown into the wok. Moreover, as remarked above, the vocabulary of English (and, for Searle-in-the-robot, of Chinese too) would have to be significantly modified to make the example work.
未知语言(无论是中文还是线形文字 B)只能作为美学对象或一组系统相关形式来处理。逻辑学家或纯数学家可以只考虑人工语言的结构特性来设计和研究人工语言(尽管 DR Hofstadter(1979)的准算术 pq 系统示例表明,形式演算的心理上令人信服且可预测的解释可能会自发出现)。但人们通常对自己母语符号的反应非常不同;事实上,很难“括起”(忽略)熟悉单词的含义。计算心理学家认为自然语言可以用程序术语来描述,这与此相关:单词、从句和句子可以看作是小程序。人们理解的自然语言中的符号会引发各种心理活动。学习一门语言就是要建立相关的因果联系,不仅是词语和世界(“猫”和垫子上的物体)之间的联系,而且还有词语和解释词语所涉及的许多非内省程序之间的联系。
An unknown language (whether Chinese or Linear B) can be dealt with only as an aesthetic object or a set of systematically related forms. Artificial languages can be designed and studied, by the logician or the pure mathematician, with only their structural properties in mind (although D. R. Hofstadter’s (1979) example of the quasi-arithmetical pq-system shows that a psychologically compelling, and predictable, interpretation of a formal calculus may arise spontaneously). But one normally responds in a very different way to the symbols of one’s native tongue; indeed, it is very difficult to ‘bracket’ (ignore) the meanings of familiar words. The view held by computational psychologists, that natural languages can be characterized in procedural terms, is relevant here: words, clauses, and sentences can be seen as mini-programs. The symbols in a natural language one understands initiate mental activity of various kinds. To learn a language is to set up the relevant causal connections, not only between words and the world (‘cat’ and the thing on the mat) but between words and the many non-introspectible procedures involved in interpreting them.
此外,我们不需要根据假设(由 Searle 提出)告知 Searle 在房间里时能够理解英语:他在房间里的行为清楚地表明他确实理解英语。或者更确切地说,这表明他只理解非常有限的英语子集。
Moreover, we do not need to be told ex hypothesi (by Searle) that Searle-in-the-room understands English: his behaviour while in the room shows clearly that he does. Or, rather, it shows that he understands a highly limited subset of English.
房间里的塞尔可能完全忘记了塞尔 99% 的英语词汇,但这并没有什么区别。他唯一需要掌握的英语知识就是解释规则手册所需的一切——规则手册规定了如何接受、选择、比较和给出不同的模式。与塞尔不同,房间里的塞尔不需要“催化”、“啤酒罐”、“叶绿素”和“餐厅”等词汇。但他可能需要“发现”、“比较”、“二”、“三角形”和“窗口”(尽管他对这些词的理解可能远不如塞尔那么全面)。如果有任何规则规定如果他看到一个波浪线,他就应该给出一个波浪线,那么他一定理解条件句。很可能,他必须了解某种表达否定、时间顺序和(特别是如果他要学会更快地完成工作)概括的方式。如果他使用的规则包括一些解析中文句子的规则,那么他也需要语法类别的单词。(他不需要解析英语句子的明确规则,例如语言处理人工智能程序中使用的解析程序,因为他已经理解英语了。)
Searle-in-the-room could be suffering from total amnesia with respect to 99 per cent of Searle’s English vocabulary, and it would make no difference. The only grasp of English he needs is whatever is necessary to interpret (sic) the rule-book—which specifies how to accept, select, compare, and give out different patterns. Unlike Searle, Searle-in-the-room does not require words like ‘catalyse’, ‘beer-can’, ‘chlorophyll’, and ‘restaurant’. But he may need ‘find’, ‘compare’, ‘two’, ‘triangular’, and ‘window’ (although his understanding of these words could be much less full than Searle’s). He must understand conditional sentences, if any rule states that if he sees a squoggle he should give out a squiggle. Very likely, he must understand some way of expressing negation, temporal ordering, and (especially if he is to learn to do his job faster) generalization. If the rules he uses include some which parse the Chinese sentences, then he will need words for grammatical categories too. (He will not need explicit rules for parsing English sentences, such as the parsing procedures employed in Al programs for language-processing, because he already understands English.)
简而言之,房间里的 Searle 只需理解 Searle 英语的子集,该子集相当于计算机所理解的编程语言,从而在窗口生成相同的“问答”输入输出行为。同样,机器人中的 Searle 必须能够理解与完全计算机化的视觉运动机器人所理解的编程语言相当的英语子集。
In short, Searle-in-the-room needs to understand only that subset of Searle’s English which is equivalent to the programming-language understood by a computer generating the same ‘question-answering’ input-output behaviour at the window. Similarly, Searle-in-the-robot must be able to understand whatever subset of English is equivalent to the programming-language understood by a fully computerized visuomotor robot.
前面两句话似乎回避了争论的焦点。事实上,这样谈论计算机理解的编程语言似乎是自相矛盾的。因为 Searle 的基本前提——他认为所有辩论参与者都接受这个前提——是计算机程序本质上是纯粹形式化的:它指定的计算纯粹是句法的,没有需要理解的内在含义或语义内容。
The two preceding sentences may seem to beg the very question at issue. Indeed, to speak thus of the programming-language understood by a computer is seemingly self-contradictory. For Searle’s basic premiss—which he assumes is accepted by all participants in the debate—is that a computer program is purely formal in nature: the computation it specifies is purely syntactic and has no intrinsic meaning or semantic content to be understood.
如果我们接受这个前提,那么上面概述的英语回答可以立即被驳回,因为它试图在无法正确进行比较的地方进行比较。但如果我们不这样做,如果——Searle(和其他人(Fodor,1980;Stich,1983))——计算机程序不仅仅关心语法,那么英语回答可能毕竟是相关的。我们现在必须转而解决这个基本问题。
If we accept this premiss, the English reply sketched above can be dismissed forthwith for seeking to draw a parallel where no parallel can properly be drawn. But if we do not, if—pace Searle (and others (Fodor, 1980; Stich, 1983))—computer programs are not concerned only with syntax, then the English reply may be relevant after all. We must now turn to address this basic question.
当然,出于某些目的,人们可以把计算机程序看作是一种未解释的逻辑演算。例如,人们可能能够通过纯粹的形式化手段证明,某个特定的合式公式可以从程序的数据结构和推理规则中推导出来。此外,所谓的解释程序可以将列表结构“(FATHER (MAGGIE))”作为输入并返回“(LEONARD)”,它确实仅根据形式化标准这样做,无法将这些模式解释为可能表示真实的人。同样,正如 Searle 指出的那样,提供餐厅脚本的程序并不代表提供餐厅知识。形式主义与某个领域之间存在映射本身并不能让形式主义的操纵者对该领域有任何了解。
Certainly, one can for certain purposes think of a computer program as an uninterpreted logical calculus. For example, one might be able to prove, by purely formal means, that a particular well-formed formula is derivable from the program’s data-structures and inferential rules. Moreover, it is true that a so-called interpreter program that could take as input the list-structure ‘(FATHER (MAGGIE))’ and return ‘(LEONARD)’ would do so on formal criteria alone, having no way of interpreting these patterns as possibly denoting real people. Likewise, as Searle points out, programs provided with restaurant-scripts are not thereby provided with knowledge of restaurants. The existence of a mapping between a formalism and a certain domain does not in itself provide the manipulator of the formalism with any understanding of that domain.
但一定不要忘记,计算机程序是为计算机编写的程序:当程序在合适的硬件上运行时,机器会执行某些操作(因此在计算机科学中使用了“指令”和“服从”这两个词)。在机器代码层面,程序对计算机的影响是直接的,因为机器的设计使得给定的指令会引发独特的操作(高级语言中的指令必须转换为机器代码指令才能执行)。因此,编程指令不仅仅是一种形式模式,甚至也不是一个声明性语句(尽管出于某些目的,它可能被归入上述任何一种描述)。它是一个过程规范,在给定合适的硬件环境的情况下,可以执行相关过程。
But what must not be forgotten is that a computer program is a program for a computer: when a program is run on suitable hardware, the machine does something as a result (hence the use in computer science of the words ‘instruction’ and ‘obey’). At the level of the machine-code the effect of the program on the computer is direct, because the machine is engineered so that a given instruction elicits a unique operation (instructions in high-level languages must be converted into machine-code instructions before they can be obeyed). A programmed instruction, then, is not a mere formal pattern—nor even a declarative statement (although it may for some purposes be thought of under either of those descriptions). It is a procedure specification that, given a suitable hardware context, can cause the procedure in question to be executed.
有人可能会这样说,编程语言不仅是表达表征(可以写在页面上或提供给计算机的结构,其中一些结构可能与人们感兴趣的事物同构)的媒介,而且可以实现某些机器的表征活动。
One might put this by saying that a programming-language is a medium not only for expressing representations (structures that can be written on a page or provided to a computer, some of which structures may be isomorphic with things that interest people) but also for bringing about the representational activity of certain machines.
人们甚至可以说,表征是一种活动,而不是一种结构。许多哲学家和心理学家认为,心理表征本质上是活跃的。最近支持这一观点的人之一是霍夫施塔特 (1985, 648),他特别批评了纽厄尔将符号视为可操纵的形式标记的说法。用他的话说,“大脑本身并不‘操纵符号’;大脑是符号漂浮并相互触发的媒介。”霍夫施塔特对“联结主义”心理学理论比对“形式主义”心理学理论表示更多的同情。联结主义方法涉及与大脑大致相似的并行处理系统,非常适合将大脑表征、符号或概念建模为动态的。但并非只有联结主义者认为概念本质上是活动的,也并非只有大脑表征可以这样思考:这一主张已被推广到涵盖专为冯·诺依曼机器设计的传统计算机程序。计算机科学家 BC Smith (1982) 认为,编程表征也本质上是活动的——并且编程语言语义的充分理论会承认这一事实。
One might even say that a representation is an activity rather than a structure. Many philosophers and psychologists have supposed that mental representations are intrinsically active. Among those who have recently argued for this view is Hofstadter (1985, 648), who specifically criticizes Newell’s account of symbols as manipulable formal tokens. In his words, ‘The brain itself does not ‘manipulate symbols’; the brain is the medium in which the symbols are floating and in which they trigger each other.’ Hofstadter expresses more sympathy for ‘connectionist’ than for ‘formalist’ psychological theories. Connectionist approaches involve parallel-processing systems broadly reminiscent of the brain, and are well suited to model cerebral representations, symbols, or concepts, as dynamic. But it is not only connectionists who can view concepts as intrinsically active, and not only cerebral representations which can be thought of in this way: this claim has been generalized to cover traditional computer programs, specifically designed for von Neumann machines. The computer scientist B. C. Smith (1982) argues that programmed representations, too, are inherently active—and that an adequate theory of the semantics of programming-languages would recognize the fact.
史密斯声称,目前计算机科学家对这些问题的理解根本不够。他提醒我们,如上所述,无论是在计算机科学内部还是外部,对于什么是意向性并没有普遍的共识,对于表示也存在很深的不清楚。而且,用计算和形式符号操作等更技术性的说法也不能避免不清楚。因为计算机科学家对这些现象的真正含义的理解很大程度上也是直观的。史密斯对编程语言的讨论指出了计算机科学内部的一些基本混淆。这里尤其相关的是,他声称计算机科学家通常在理论上过于彻底地区分程序的控制功能和其作为形式句法系统的性质。
At present, Smith claims, computer scientists have a radically inadequate understanding of such matters. He reminds us that, as remarked above, there is no general agreement—either within or outside computer science—about what intentionality is, and deep unclarities about representation as well. Nor can unclarities be avoided by speaking more technically, in terms of computation and formal symbol-manipulation. For the computer scientist’s understanding of what these phenomena really are is also largely intuitive. Smith’s discussion of programming-languages identifies some fundamental confusions within computer science. Especially relevant here is his claim that computer scientists commonly make too complete a theoretical separation between a program’s control-functions and its nature as a formal-syntactic system.
Smith 所批评的理论分歧在广泛使用的“对偶演算”编程方法中显而易见。对偶演算方法在声明性(或指称性)表示结构和在程序运行时解释该结构的过程语言之间提出了一个鲜明的理论区别。事实上,知识表示和解释器有时用两种截然不同的形式编写(例如分别用谓词演算和 LISP)。然而,它们通常都用同一种形式表达;例如,LISP(LISt-Processing 语言的缩写)允许以形式上相似的方式表达事实和过程,PROLOG(逻辑编程)也是如此。在这种情况下,对偶演算方法规定,相关的(单一)编程语言在理论上应该用两种截然不同的方式描述。
The theoretical divide criticized by Smith is evident in the widespread ‘dual-calculus’ approach to programming. The dual-calculus approach posits a sharp theoretical distinction between a declarative (or denotational) representational structure and the procedural language that interprets it when the program is run. Indeed, the knowledge-representation and the interpreter are sometimes written in two quite distinct formalisms (such as predicate calculus and LISP, respectively). Often, however, they are both expressed in the same formalism; for example, LISP (an acronym for LISt-Processing language) allows facts and procedures to be expressed in formally similar ways, and so does PROLOG (PROgramming-in-LOGic). In such cases, the dual-calculus approach dictates that the (single) programming-language concerned be theoretically described in two quite different ways.
为了说明此处所讨论的区别,假设我们想要一种家庭关系的表示,它可用于回答有关此类问题的问题。我们可能会决定采用列表结构来表示诸如 Leonard 是 Maggie 的父亲这样的事实。或者我们可能更喜欢基于框架的表示,其中 FATHER 框架中的相关名称槽可以同时由“LEONARD”和“MAGGIE”填充。同样,我们可能会选择谓词演算的公式,说存在两个人(即 Leonard 和 Maggie),Leonard 是 Maggie 的父亲。最后,我们可能会使用英语句子“Leonard 是 Maggie 的父亲”。
To illustrate the distinction at issue here, suppose that we wanted a representation of family relationships which could be used to provide answers to questions about such matters. We might decide to employ a list-structure to represent such facts as that Leonard is the father of Maggie. Or we might prefer a frame-based representation, in which the relevant name-slots in the FATHER-frame could be simultaneously filled by ‘LEONARD’ and ‘MAGGIE’. Again, we might choose a formula of the predicate calculus, saying that there exist two people (namely, Leonard and Maggie), and Leonard is the father of Maggie. Last, we might employ the English sentence ‘Leonard is the father of Maggie.’
这四种表示法都可以写在纸上/画在纸上(就像 Searle-in-the-room 使用的规则手册中的规则一样),以便我们解释是否已经学会了如何处理相关符号。或者,它们可以体现在计算机数据库中。但要使它们可供计算机使用,必须有一个解释程序,当我们“询问”计算机谁是 Maggie 的父亲时,该程序(例如)可以找到项目“LEONARD”。任何有理智的人都不会在计算机中体现列表结构而不为其提供列表处理功能,也不会为其提供框架而不提供插槽填充机制,逻辑公式而不提供推理规则,或英语句子而不提供解析程序。(类似地,知道 Searle 不会说葡萄牙语的人不会给 Searle-in-the-room 一本葡萄牙语规则手册,除非他们准备先教他葡萄牙语。)
Each of these four representations could be written/drawn on paper (as are the rules in the rule-book used by Searle-in-the-room), for us to interpret if we have learnt how to handle the relevant notation. Alternatively, they could be embodied in a computer database. But to make them usable by the computer, there has to be an interpreter-program which (for instance) can find the item ‘LEONARD’ when we ‘ask’ it who is the father of Maggie. No one with any sense would embody list-structures in a computer without providing it also with a list-processing facility, nor give it frames without a slot-filling mechanism, logical formulae without rules of inference, or English sentences without parsing procedures. (Analogously, people who knew that Searle speaks no Portuguese would not give Searle-in-the-room a Portuguese rule-book unless they were prepared to teach him the language first.)
史密斯并不否认,表达式的外延意义(广义上:哪些实际或可能的世界可以映射到它上面)与其程序结果(广义上:它做什么或使之发生)之间存在重要区别。表达式“(父亲(玛吉))”与两个实际的人之间的某种父母关系同构(因此我们可以将其映射到该关系上)是一回事。表达式“(父亲(玛吉))”可以使某台计算机找到“伦纳德”又是另一回事。如果不是这样,对偶演算方法就不会发展起来。但他认为,与其坚持对偶演算方法,不如采用一种旨在涵盖外延和程序方面的“统一”编程语言理论,这样会更优雅,也更不容易混淆。
Smith does not deny that there is an important distinction between the denotational import of an expression (broadly: what actual or possible worlds can be mapped onto it) and its procedural consequence (broadly: what it does, or makes happen). The fact that the expression ‘(FATHER (MAGGIE))’ is isomorphic with a certain parental relationship between two actual people (and so might be mapped onto that relationship by us) is one thing. The fact that the expression ‘(FATHER (MAGGIE))’ can cause a certain computer to locate ‘LEONARD’ is quite another thing. Were it not so, the dual-calculus approach would not have developed. But he argues that, rather than persisting with the dual-calculus approach, it would be more elegant and less confusing to adopt a ‘unified’ theory of programming-languages, designed to cover both denotative and procedural aspects.
他指出,对偶演算两边的许多基本术语既有深刻的理论共性,也有显著的差异。例如,逻辑学家和计算机科学家对变量的概念理解有些相似:两者都允许一个变量在不同时间被赋予不同的值。既然如此,对变量有两种不同的理论是多余的。然而,在某种程度上,逻辑学家和计算机科学家对这个术语的理解不同:LISP 编程语言(例如)中变量的值是另一种 LISP 表达式,而逻辑中变量的值通常是形式主义本身之外的某个对象。应该澄清这些差异——尤其是为了避免在系统尝试使用变量来推理变量时产生混淆。简而言之,我们需要一个“变量”的单一定义,既允许其声明性使用(在逻辑中),也允许其程序性使用(在编程中)。在表明类似的评论也适用于其他基本计算术语后,史密斯概述了 LISP 语义的统一说明,并描述了一种以统一方法为理念设计的新微积分(MANTIQ)。
He shows that many basic terms on either side of the dual-calculus divide have deep theoretical commonalities as well as significant differences. The notion of variable, for instance, is understood in somewhat similar fashion by the logician and the computer scientist: both allow that a variable can have different values assigned to it at different times. That being so, it is redundant to have two distinct theories of what a variable is. To some extent, however, logicians and computer scientists understand different things by this term: the value of a variable in the LISP programming-language (for example) is another LISP-expression, whereas the value of a variable in logic is usually some object external to the formalism itself. These differences should be clarified—not least to avoid confusion when a system attempts to reason about variables by using variables. In short, we need a single definition of ‘variable’, allowing both for its declarative use (in logic) and for its procedural use (in programming). Having shown that similar remarks apply to other basic computational terms, Smith outlines a unitary account of the semantics of LISP and describes a new calculus (MANTIQ) designed with the unified approach in mind.
正如使用变量推理变量的例子所表明的那样,统一的计算理论可以阐明反思性知识是如何可能的。因为,有了这样的理论,系统对数据和过程的表示(包括系统本身内部的过程)基本上是可比的。这一理论优势具有心理学意义(也是史密斯工作的主要动机)。
As the example of using variables to reason about variables suggests, a unified theory of computation could illuminate how reflective knowledge is possible. For, given such a theory, a system’s representations of data and of processes—including processes internal to the system itself—would be essentially comparable. This theoretical advantage has psychological relevance (and was a major motivation behind Smith’s work).
然而,就我们目前的目的而言,关键点在于程序和计算的基本理论应该承认计算机程序的基本功能是使事情发生。符号逻辑可以看作是玩弄未解释的形式演算(例如谓词演算),而计算逻辑可以看作是研究数学指定的“机器”(例如图灵机)中抽象的永恒关系,但计算机科学不能用这两种方式来正确描述。
For our present purposes, however, the crucial point is that a fundamental theory of programs, and of computation, should acknowledge that an essential function of a computer program is to make things happen. Whereas symbolic logic can be viewed as mere playing around with uninterpreted formal calculi (such as the predicate calculus), and computational logic can be seen as the study of abstract timeless relations in mathematically specified ‘machines’ (such as Turing machines), computer science cannot properly be described in either of these ways.
从史密斯的论证中可以看出,人们通常认为计算机程序只包含语法而没有语义,这种说法是错误的。任何计算机程序固有的程序后果都使其在语义学中占有一席之地,而语义学不是指称性的,而是因果性的。这与 Searle-in-the-room 对英语的理解类似,而不是他对中文的理解。
It follows from Smith’s argument that the familiar characterization of computer programs as all syntax and no semantics is mistaken. The inherent procedural consequences of any computer program give it a toehold in semantics, where the semantics in question is not denotational, but causal. The analogy is with Searle-in-the-room’s understanding of English, not his understanding of Chinese.
A. Sloman (1986a; 1986b) 的讨论也暗示了这一点,在这种意义上,编程指令和计算机符号必须被认为具有某种语义,无论语义有多么受限。在因果语义学中,符号(无论是简单的还是复杂的)的含义应通过参考其与其他现象的因果关系来寻求。核心问题是“是什么导致符号被构建和/或激活?”和“结果是什么?”答案有时会提到观察者可见的外部物体和事件,有时则不会。
This is implied also by A. Sloman’s (1986a; 1986b) discussion of the sense in which programmed instructions and computer symbols must be thought of as having some semantics, however restricted. In a causal semantics, the meaning of a symbol (whether simple or complex) is to be sought by reference to its causal links with other phenomena. The central questions are ‘What causes the symbol to be built and/or activated?’ and ‘What happens as a result of it?’ The answers will sometimes mention external objects and events visible to an observer, and sometimes they will not.
如果系统是人类、动物或机器人,它可能具有因果能力,使其能够指代餐馆和豆芽(这里可以忽略指代外部(包括不可观察的)对象的哲学复杂性,但斯洛曼对此进行了有益的讨论)。但无论涉及哪种信息处理系统,答案有时都会描述纯粹的内部计算过程——通过这些过程构建其他符号,激活其他指令。例子包括 Searle-in-the-room 头脑中的解释过程(可能与为自动自然语言处理定义的解析和语义过程相似),这些过程由英语单词引发,以及 Schankian 文本分析程序中的计算过程。虽然这样的程序不能使用符号“餐厅”来表示餐厅(因为它与餐厅、食物等没有因果关系),但它的内部符号和程序确实体现了对某些其他事物的一些最低限度的理解——例如,比较两个正式结构是什么。
If the system is a human, animal, or robot, it may have causal powers which enable it to refer to restaurants and beansprouts (the philosophical complexities of reference to external, including unobservable, objects may be ignored here, but are helpfully discussed by Sloman). But whatever the information-processing system concerned, the answers will sometimes describe purely internal computational processes—whereby other symbols are built, other instructions activated. Examples include the interpretative processes inside Searle-in-the-room’s mind (comparable perhaps to the parsing and semantic procedures defined for automatic natural-language processing) that are elicited by English words, and the computational processes within a Schankian text-analysis program. Although such a program cannot use the symbol ‘restaurant’ to mean restaurant (because it has no causal links with restaurants, food and so forth), its internal symbols and procedures do embody some minimal understanding of certain other matters—of what it is to compare two formal structures, for example.
人们可能会觉得,在这种情况下所涉及的“理解”是如此微不足道,以至于根本不应该使用这个词。就这样吧。正如斯洛曼所言,重要的问题不是“机器什么时候理解某件事?”(这个问题误导性地暗示理解有一个明确的分界点),而是“机器(无论是否是生物机器)需要能够做什么才能理解?”这个问题不仅与计算心理学的可能性有关,也与其内容有关。
One may feel that the ‘understanding’ involved in such a case is so minimal that this word should not be used at all. So be it. As Sloman makes clear, the important question is not ‘When does a machine understand something?’ (a question which misleadingly implies that there is some clear cut-off point at which understanding ceases) but ‘What things does a machine (whether biological or not) need to be able to do in order to be able to understand?’ This question is relevant not only to the possibility of a computational psychology, but to its content also.
总之,我的讨论表明,塞尔对计算心理学的攻击是站不住脚的。将塞尔在房间里看作计算机程序的实例,并不是说他完全不了解。由于形式主义计算心理学的理论应该比作计算机程序而不是形式逻辑,因此计算心理学在原则上并非无法解释意义如何与心理过程相关。
In sum, my discussion has shown Searle’s attack on computational psychology to be ill founded. To view Searle-in-the-room as an instantiation of a computer program is not to say that he lacks all understanding. Since the theories of a formalist-computational psychology should be likened to computer programs rather than to formal logic, computational psychology is not in principle incapable of explaining how meaning attaches to mental processes.
弗朗西丝·伊根
Frances Egan
1995
1995
自 20 世纪 60 年代行为主义消亡以来,计算主义一直是认知心理学的主导思想。计算理论将人类认知过程视为一种信息处理,将实现此类处理的系统视为符号操作系统。将设备描述为符号操纵器意味着可以将其某些内部状态视为特定领域中的属性或对象的表示。例如,视觉计算理论假设内部状态可以解释为表示远端场景的深度。
The dominant program in cognitive psychology since the demise of behaviorism in the 1960s has been computationalism. Computational theories treat human cognitive processes as a species of information processing, and the systems that implement such processing as symbol-manipulating systems. Describing a device as a symbol manipulator implies that it is possible to treat some of its internal states as representations of properties or objects in a particular domain. Computational theories of vision, for example, posit internal states that can be interpreted as representing the depth of the distal scene.
关于计算理论所假设的状态所赋予的表征内容的性质和功能,存在着相当大的分歧。人们普遍认为,这类理论尊重杰里·福多尔 (Jerry Fodor) (1980) 所说的“形式性条件”,该条件要求计算过程只能访问定义它们所基于的表征的形式属性(即非语义属性)。通过尊重形式性条件,计算主义有望回答心灵哲学中最紧迫的问题之一——表征心理状态如何在行为的产生中产生因果效应?根据计算主义的说法,表征心理状态由于(大致)其结构属性而具有因果作用。1但这种优势是有代价的。计算描述的形式特征似乎没有为它所表征的心理状态的语义属性留下任何实际工作。因此,有些人认为计算主义支持一种消除主义,即否认有意描述的状态在心理学中发挥真正的解释作用的论点(例如,参见 Stich,1983)。如果计算状态的内容确实没有解释作用,那么计算心理学所表征的心理状态与我们的常识解释实践所表征的心理状态之间的关系(确实涉及内容)就相当模糊了。
There has been considerable disagreement about the nature and function of representational contents assigned to the states posited by computational theories. It is widely thought that such theories respect what Jerry Fodor (1980) has called the “formality condition,” which requires that computational processes have access only to the formal (that is, nonsemantic) properties of the representations over which they are defined. It is by respecting the formality condition that computationalism promises to answer one of the most pressing problems in the philosophy of mind—how can representational mental states be causally efficacious in the production of behavior? Representational mental states, according to computationalism, have their causal roles in virtue of (roughly) their structural properties.1 But this advantage comes at a price. The formal character of computational description appears to leave no real work for the semantic properties of the mental states it characterizes. Thus, computationalism has been thought by some to support a form of eliminativism, the thesis that denies that intentionally characterized states play a genuinely explanatory role in psychology (see, for example, Stich, 1983). If the content of computational states is indeed explanatorily idle, then the relation between psychological states, as characterized by computational psychology, and psychological states as characterized by our commonsense explanatory practices, which do advert to content, is quite obscure.
在本文中,我阐明并捍卫了一种策略,用于协调计算描述的形式特征与对心理解释有用性的承诺。内容。我认为内容在计算理论中并不起个体或分类作用——过程的计算表征是一种形式表征。尽管如此,内容在认知能力的计算描述中确实发挥着真正的解释作用。内容归因将内部过程的形式表征与主体的环境联系起来,使计算理论能够解释过程的运行如何构成该环境中认知能力的运用。我通过参考 David Marr 的早期视觉理论2来支持我对内容在计算心理学中的作用的解释,部分原因是它受到了哲学家的极大关注;然而,我的论点取决于计算方法的一般特征,因此适用于一般的计算理论。
In this paper I articulate and defend a strategy for reconciling the formal character of computational description with a commitment to the explanatory usefulness of mental content. I argue that content does not play an individuative or taxonomic role in computational theories—a computational characterization of a process is a formal characterization. Nonetheless, content does play a genuine explanatory role in computational accounts of cognitive capacities. Content ascriptions connect the formal characterization of an internal process with the subject’s environment, enabling the computational theory to explain how the operation of the process constitutes the exercise of a cognitive capacity in that environment. I support my account of the role of content in computational psychology by reference to David Marr’s theory of early vision,2 in part because it has received a great deal of attention from philosophers; however, my argument depends on general features of computational methodology, and so applies to computational theories generally.
最近,为了调和计算和内容,人们开始诉诸狭义内容的概念,即内容依赖于主体的内在物理状态。狭义内容的支持者迄今为止未能阐明一个明确适用于心理学中真正的解释工作的概念。3我认为,在计算解释中起核心作用的通常是广义内容,尽管我确实确定了狭义内容归因所发挥的特定(且有限的)作用。
Recent attempts to reconcile computation and content have appealed to a notion of narrow content, that is, content that supervenes on intrinsic physical states of the subject. Proponents of narrow content have so far failed to articulate a notion that is clearly suitable for genuine explanatory work in psychology.3 I argue that it is typically broad content that plays a central role in computational explanation, though I do identify a specific (and limited) function served by narrow content ascription.
有人可能会说,尽管有形式条件,但计算理论在以下意义上是有意的:它们假设的状态不仅具有表征内容,而且它们所具有的内容在理论中发挥着个体作用。换句话说,计算理论根据状态的内容对状态进行分类。
It might be argued that, the formality condition notwithstanding, computational theories are intentional in the following sense: The states they posit not only have representational content, but the content they have plays an individuative role in the theory. In other words, computational theories taxonomize states by reference to their contents.
声称计算认知理论在上述意义上是有意的,其动机并不难理解。请考虑以下段落:
The motivation for the claim that computational theories of cognition are intentional in the above sense is not hard to understand. Consider the following passages:
除了归因于代表客观物理属性的意向状态之外,没有其他方法可以将视觉系统视为解决理论认为它正在解决的问题。(Burge,1986,28-29)至少可以说,当理性能力是被解释项时,被解释项中必然存在命题态度。如果这个论点是正确的,那么 Stich 及其追随者坚持认为认知心理学通过参考不具有命题内容的状态来解释理性能力在实用上是不连贯的。(Hannan,1993)
There is no other way to treat the visual system as solving the problem that the theory sees it as solving than by attributing intentional states that represent objective physical properties. (Burge, 1986, 28–29) [I]t is at least arguable that where rational capacities are the explananda, it is necessary that there be propositional attitudes in the explanans. If this argument is correct, then it is pragmatically incoherent for Stich and his followers to insist that cognitive psychology explains rational capacities by reference to states not described as possessing propositional content. (Hannan, 1993)
这两段文字背后的论点可以粗略地表达如下:
The argument underlying both passages can be expressed somewhat crudely as follows:
(P)计算心理学理论的解释对象是被有意表征的主体能力。
(P) The explananda of computational psychological theories are intentionally characterized capacities of subjects.
(C)因此,计算心理学理论是有意的——它们假定有意状态。
(C) Therefore, computational psychological theories are intentional—they posit intentional states.
这一论点的基础是直觉,即科学解释应该“匹配”(在某种意义上)其被解释项。威尔逊赞同这种约束,他称之为理论适当性。
Underlying the argument is the intuition that scientific explanations should “match” (in some sense) their explananda. Wilson endorses a constraint of this sort, which he calls theoretical appropriateness.
如果某种解释在与现象特征(在被解释项中)相匹配的解释水平上提供了对某一现象的自然(例如非析取)说明,那么从理论上讲,这种解释就是恰当的。(1994,57)
An explanation is theoretically appropriate when it provides a natural (e.g. non-disjunctive) account of a phenomenon at a level of explanation matching the level at which that phenomenon is characterized [in the explanandum]. (1994, 57)
“解释层次”的概念有些模糊,但为了论证的目的,我们假设存在一个独特的解释层次,即所有且只有该层次的解释才涉及内容的归属。如果理论的适当性是科学解释的必要条件,那么对有意表征现象的解释本身就应该关注有意表征的状态。4
The notion of a “level of explanation” is somewhat vague, but let us assume, for the sake of argument, that there is a unique level of explanation such that all and only explanations at that level involve ascriptions of content. If theoretical appropriateness is a desideratum of scientific explanation, then an explanation of intentionally characterized phenomena should itself advert to intentionally characterized states.4
但是,当我们考虑计算解释时,一个尚未解决的紧张关系浮现出来。计算理论的解释项是故意指定的,这一事实表明,计算状态本质上是通过参考其内容而个体化的。如果计算理论不是有意的,那么计算理论如何解释有意表征的现象?但是形式性条件施加了相反的压力。在要求计算过程只能访问它们所定义的表征状态的非语义属性时,它表明计算个体化是非语义的,或者用 Fodor 的术语来说,是形式化的。计算分类法是有意的还是形式化的?此时,转向一个完善的例子会有所帮助。
An unresolved tension surfaces, though, when we consider computational explanation. The fact that the explananda of computational theories are intentionally specified suggests that computational states are essentially individuated by reference to their contents. If computational theories are not intentional, then how can computational theories explain intentionally characterized phenomena? But the formality condition exerts an opposite pressure. In requiring that computational processes have access only to the nonsemantic properties of the representational states over which they are defined, it suggests that computational individuation is nonsemantic, or in Fodor’s terminology, formal. Are computational taxonomies intentional or formal? At this point it is helpful to turn to a well-developed example.
Marr 视觉理论的解释者认为,视觉状态在该理论中根据其内容而个体化,因此该理论是意向性的(参见 Burge,1986;Kitcher,1988;Segal,1989、1991;Davies,1991;Morton,1993;Shapiro,1993)。尽管关于 Marrian 结构的内容类型(广义或狭义)存在很多分歧,但内容在该理论中发挥个体作用的假设被认为不需要明确的论证。5 Burge表示,从“该理论的顶层明确地以意向性术语表述”这一事实来看,该理论是意向性的“证据充分”。我认为,在将内容解释为个体性时,Marr 的解释者误解了内容在计算理论中的作用。
Interpreters of Marr’s theory of vision have assumed that visual states are individuated in the theory by reference to their contents, hence that the theory is intentional (see Burge, 1986; Kitcher, 1988; Segal, 1989, 1991; Davies, 1991; Morton, 1993; Shapiro, 1993). Although there has been a good deal of disagreement about the sort of content (broad or narrow) that Marrian structures have, the assumption that content plays an individuative role in the theory has not been thought to require explicit argument.5 Burge says that it is “sufficiently evident” that the theory is intentional from the fact that “the top levels of the theory are explicitly formulated in intentional terms” (1986, 55). I shall argue that in construing content as individuative, interpreters of Marr have misconstrued the role of content in computational theories.
虽然 Marr 在非正式阐述各种视觉过程时,通常通过参考远端场景的特征来描述它们,但不应过分解读这一事实。这些过程也具有正式的特征。它们必须是正式的——Marr 的视觉理论是一种计算理论,正式的特征保证了它们是可编程的(因此,是物理上可实现的)。问题是,哪种特征可以完成个体工作?
While it is true that in his informal exposition of the various visual processes Marr typically characterizes them by reference to features of the distal scene, one should not read too much into this fact. The processes are also characterized formally. They have to be—Marr’s theory of vision is a computational theory, and a formal characterization guarantees that they are programmable (hence, physically realizable). The question is, which characterization does the individuative work?
Marr 有力地论证了信息处理系统应该在三个不同的描述层次上进行分析。Marr 称之为“顶层”理论,计算,是系统计算的功能的特征——系统做什么。算法层指定了计算功能的算法,实现层描述了该过程的物理实现方式。6 Marr 层次结构的顶层有时与 Pylyshyn 的语义层(Pylyshyn,1984)和 Newell 的知识层(Newell,1982)相一致。换句话说,计算理论本质上被解释为一种机制的意向性或语义性表征。但这样的解释让 Marr 坚持认为算法的探索必须等待计算理论的精确规范这一点有些令人费解。他说,“除非正确地制定了过程的计算理论,否则算法几乎肯定是错误的”(1982,124),这表明顶层应该理解为提供设备的功能理论表征。事实上,Marr 明确指出,计算理论是对各种处理模块计算出的函数的数学表征。在描述表征图像初始滤波(计算图像的拉普拉斯算子与高斯卷积)的数学公式时,Marr 说道:
Marr argued persuasively that an information-processing system should be analyzed at three distinct levels of description. The “top” level, which Marr called the theory of the computation, is a characterization of the function computed by the system—what the system does. The algorithmic level specifies an algorithm for computing the function, and the implementation level describes how the process is realized physically.6 The top level in Marr’s hierarchy is sometimes identified with Pylyshyn’s semantic level (Pylyshyn, 1984) and Newell’s knowledge level (Newell, 1982). In other words, the theory of the computation has been construed as essentially an intentional or semantic characterization of a mechanism. But such a construal makes somewhat puzzling Marr’s insistence that the search for the algorithm must await the precise specification of the theory of the computation. He says, “unless the computational theory of the process is correctly formulated, the algorithm will almost certainly be wrong” (1982, 124), suggesting that the top level should be understood to provide a function-theoretic characterization of the device. Indeed, Marr explicitly points out that the theory of the computation is a mathematical characterization of the function(s) computed by the various processing modules. In describing the mathematical formula that characterizes the initial filtering of the image (the calculation of the Laplacian of the image convolved with a Gaussian), Marr says the following:
我曾论证,从计算的角度来看,[视网膜] 发出信号 ∇ 2 G * I(X 通道)及其时间导数∂/∂t (∇ 2 G * I )(y 通道)。从计算的角度来看,这是对视网膜功能的精确描述。当然,它的功能远不止这些 — — 它传导光线、允许巨大的动态范围、具有有趣特征的中央凹、可以移动等等。您接受的关于视网膜功能的合理描述取决于您的观点。我个人认为 ∇ 2 G是一个充分的描述,尽管我坚持信息处理的观点。(1982,537)
I have argued that from a computational point of view [the retina] signals ∇2G * I (the X channels) and its time derivative ∂/∂t(∇2G * I) (the y channels). From a computational point of view, this is a precise characterization of what the retina does. Of course, it does a lot more—it transduces the light, allows for a huge dynamic range, has a fovea with interesting characteristics, can be moved around, and so forth. What you accept as a reasonable description of what the retina does depends on your point of view. I personally accept ∇2G as an adequate description, although I take an unashamedly information-processing point of view. (1982, 537)
∇ 2 G是一个函数,它以二维强度数组I ( x, y ) 作为参数,以数组中各点 ( x, y )处强度的各向同性变化率作为值。此函数的实现用于 Marr 和 Hildreth (1980) 的边缘检测模型,以检测零交叉点。(零交叉点是函数值改变符号的点。零交叉对应于图像中的急剧强度变化。)Marr 承认,视网膜计算的函数的数学规范可能无法使视网膜所做的事情变得清晰。尽管如此,从信息处理的角度来看,正式规范是“充分的”。更准确地说,它是算法的正确规范所关键依赖的描述。
∇2 G is a function that takes as arguments two-dimensional intensity arrays I(x, y) and has as values the isotropic rates of change of intensity at points (x, y) in the array. The implementation of this function is used in Marr and Hildreth’s (1980) model of edge detection to detect zero-crossings. (A zero-crossing is a point where the value of a function changes its sign. Zero-crossings correspond to sharp intensity changes in the image.) Marr grants that the mathematical specification of the function computed by the retina may not make what the retina does perspicuous. Nonetheless, from an information-processing point of view, the formal specification is “adequate.” More precisely, it is the description upon which the correct specification of the algorithm crucially depends.
顶层提供数学表征的说法并不意味着说视觉系统将光强度值的表示作为输入并产生形状的表示作为输出是错误的。我并不否认计算过程具有真正的意向(语义)描述。出于某些目的,正如我们将在下一节中看到的那样,过程的意向描述将比形式表征更可取。说机制计算的功能的意向表征位于 Marr 层次结构的顶层并没错,尽管意向表征提供了设备功能的外在描述,并且没有将计算过程个体化。为了个体化的目的,计算理论给出的精确数学描述才是重要的描述。7、8
The claim that the top level provides a mathematical characterization does not imply that it is wrong to speak of the visual system as taking representations of light intensity values as input and yielding representations of shape as output. I am not denying that computational processes have true intentional (semantic) descriptions. For some purposes, as we shall see in the next section, an intentional description of a process will be preferable to a formal characterization. It is not incorrect to say that an intentional characterization of the function computed by a mechanism resides at the top level in Marr’s hierarchy, although the intentional characterization provides an extrinsic description of what the device does, and does not individuate the computational process. For the purpose of individuation, the precise mathematical description given by the theory of the computation is the description that counts.7 ,8
如果正如我所论证的那样,计算说明的顶层提供了设备的纯数学特征,那么就不太可能将第二层或算法层解释为有意的。(据我所知,只有 Burge 将算法层解释为有意的,显然是因为在讨论各种可能的算法时,Marr 有时会使用有意的语言。)理论的算法层只是指定系统如何计算顶层以数学术语表征的函数。
If, as I have argued, the top level of a computational account provides a purely mathematical characterization of a device, then there is little temptation to construe the second, or algorithmic level, as intentional. (Only Burge, as far as I know, construes the algorithmic level as intentional, apparently because in discussing various possible algorithms Marr sometimes employs intentional language.) The algorithmic level of theory simply specifies how the function characterized in mathematical terms at the top level is computed by the system.
我曾论证过,马尔的视觉理论不是有意的。我的论点诉诸于计算方法论的一般特征;如果我是对的,那么认知的计算理论就不是有意的——这些理论所表征的状态和过程并不是根据赋予它们的表征内容而个体化的。形式化(即数学化)表征起到了分类作用。
I have argued that Marr’s theory of vision is not intentional. My argument appeals to general features of computational methodology; if I am right, then computational theories of cognition are not intentional—the states and processes characterized by such theories are not individuated by reference to the representational contents ascribed to them. The formal—namely mathematical—characterization does the taxonomic work.
让我们暂时考虑一下计算理论不是有意为之这一说法的含义。从计算的角度来看,使用相同算法计算相同数学函数的两种机制是相同的机制,即使它们可能部署在完全不同的环境中。计算描述是对机制的独立于环境的表征。9由于计算过程通常被解释为模块化过程,因此即使是内部环境也与计算过程的类型个体化无关。想象视觉系统的一个组成部分,称为 visex ,它根据双眼视差信息计算出视觉场景深度的表示。10现在想象一下,在某个实际或想象的生物的听觉系统中,有一个在物理上与 visex 相同的组成部分。将这个组成部分称为audex。根据适合这种生物的听觉处理理论,audex 计算出某些声音特性的表示。我们可以想象将特定的 visex 和 audex 从视觉和听觉系统中的正常嵌入中移除并进行切换。由于这两个组件在假设上在物理上是相同的,因此它们计算同一类函数。切换不会对生物的行为或它们头脑中发生的事情产生明显的影响。尽管它们的正常内部环境存在差异,但这两种机制在计算上是相同的。
Let us consider for a moment the implications of the claim that computational theories are not intentional. Two mechanisms that compute the same mathematical function, using the same algorithm, are, from a computational point of view, the same mechanism, even though they may be deployed in quite different environments. A computational description is an environment-independent characterization of a mechanism.9 Inasmuch as computational processes are generally construed as modular processes, even the internal environment is irrelevant to the type-individuation of a computational process. Imagine a component of the visual system, called the visex, that computes a representation of the depth of the visual scene from information about binocular disparity.10 Now imagine that within the auditory system of some actual or imagined creature there is a component that is physically identical to the visex. Call this component the audex. According to the theory of auditory processing appropriate to this creature, the audex computes a representation of certain sonic properties. We can imagine a particular visex and audex removed from their normal embeddings in visual and auditory systems respectively and switched. Since the two components are by hypothesis physically identical, they compute the same class of functions. The switch will make no discernible difference to the behavior of the creatures, nor to what is going on inside their heads. The two mechanisms are computationally identical, despite the difference in their normal internal environments.
也许应该注意到,描述 visex 的视觉理论将其描述为根据视差计算深度表示,而不是计算某些声音属性的表示,尽管如果将其嵌入到不同的内部环境中,它将执行后者。重要的一点是,假设的结构没有独立于它们通常所处的环境(内部和外部)的内容。这就是计算过程的有意描述是外在描述的意义。原始草图中的结构包含来自几个不同的 ∇ 2 G通道的信息,并为 Marr 理论所描述的大多数模块化过程提供输入,这些结构与诸如物体边界或照明变化等显着的远端属性可靠地相关,并被 Marr 描述为代表这些属性。然而,在某些完全不同的环境中,相同的结构可能与不同的远端属性相关,或者可能与世界的客观特征无关。在后一个世界中,这些结构不会代表任何东西,除了图像的特征。在那个世界中他们不会有任何远端内容。
It will perhaps be noted that the visual theory that describes the visex characterizes it as computing a representation of depth from disparity, and not as computing a representation of certain sonic properties, although it would do the latter if it were embedded in a different internal environment. The important point is that the postulated structures have no content considered independently of the environment (internal and external) in which they are normally situated. This is the sense in which an intentional characterization of a computational process is an extrinsic description. Structures in the raw primal sketch, which contains information from several distinct ∇2G channels and provides the input to most of the modular processes characterized by Marr’s theory, are reliably correlated with such salient distal properties as object boundaries or changes in illumination, and are described by Marr as representing these properties. In some radically different environment, however, the same structures may be correlated with different distal properties, or perhaps with no objective feature of the world. In the latter world, the structures would not represent anything, except perhaps features of the image. They would have no distal content in that world.
我想强调的一点是,计算机制的刻意表征隐含着与该机制通常所处环境的相对关系。计算理论提供的数学表征则不然。只有数学表征才能选出计算机制的基本属性。刻意表征并非必不可少,因为在某些可能的情况下它并不适用。
The point I wish to underscore is that an intentional characterization of a computational mechanism involves an implicit relativization to the context in which the mechanism is normally embedded. The mathematical characterization provided by the theory of the computation does not. Only the mathematical characterization picks out an essential property of a computational mechanism. The intentional characterization is not essential, since in some possible circumstances it would not apply.
那么,表征内容在认知过程的计算描述中扮演什么角色呢?如果不是从本质上描述认知过程的话。我曾在其他地方论证过,Egan (1992) 认为,语义解释在计算心理学中扮演的角色类似于解释模型在物理科学中扮演的角色。这种说法在两个方面是正确的。首先,有意描述一个本质上形式化的过程具有解释性功能,可以解释形式化描述,而形式化描述本身可能并不清晰。其次,当一个理论没有完全说明时(如 Marr 的理论),对该理论模型的研究通常有助于随后对该理论本身的阐述。计算理论家可能会部分地参考某些表征域的特征来描述计算,希望以后能提供形式化的细节(即理论)。
What, then, is the role that representational content plays in computational accounts of cognitive processes, if not to essentially characterize cognitive processes? I have argued elsewhere Egan (1992) that semantic interpretations play a role in computational psychology analogous to the role played by explanatory models in the physical sciences. There are two senses in which this is true. In the first place, an intentional characterization of an essentially formal process serves an expository function, explicating the formal account, which might not itself be perspicuous. Secondly, when a theory is incompletely specified (as is Marr’s theory), the study of a model of the theory can often aid in the subsequent elaboration of the theory itself. A computational theorist may resort to characterizing a computation partly by reference to features of some represented domain, hoping to supply the formal details (i.e., the theory) later.
虽然我认为与物理学中的模型的类比很有趣也很有用,但计算过程的有意解释所发挥的最重要的作用却是心理学所独有的。预先定义心理学理论领域的问题通常以有意的术语来表达。例如,我们希望视觉理论能够告诉我们视觉系统如何从二维图像中包含的信息中检测深度等。对假设的计算过程的有意规范表明,该理论确实回答了这些问题。只有在将系统的某些状态解释为远端属性(如深度或表面反射率)的表示的情况下,计算理论给出的数学特征的过程才会被揭示为视觉。因此,内容归因起着至关重要的解释作用:我们需要它们来解释正式表征的过程的运行如何构成在通常部署该过程的环境中认知能力的运用。
Though the analogy with models in physics is, I think, interesting and useful, the most important function served by intentional interpretations of computational processes is unique to psychology. The questions that antecedently define a psychological theory’s domain are usually couched in intentional terms. For example, we want a theory of vision to tell us, among other things, how the visual system can detect depth from information contained in two-dimensional images. An intentional specification of the postulated computational processes demonstrates that these questions are indeed answered by the theory. It is only under an interpretation of some of the states of the system as representations of distal properties (like depth, or surface reflectance) that the processes given a mathematical characterization by a computational theory are revealed as vision. Thus content ascriptions play a crucial explanatory role: we need them to explain how the operation of a formally characterized process constitutes the exercise of a cognitive capacity in the environment in which the process is normally deployed.
让我们回到之前关于计算认知理论是有意的这一论证:
Let us return for a moment to the argument considered earlier for the claim that computational theories of cognition are intentional:
(P)计算心理学理论的解释对象是被有意表征的主体能力。
(P) The explananda of computational psychological theories are intentionally characterized capacities of subjects.
(C)因此,计算心理学理论是有意的——它们假定有意状态。
(C) Therefore, computational psychological theories are intentional—they posit intentional states.
这个论证的前提是正确的——定义心理学理论解释领域的问题通常用意向性术语来表达——但是,正如我们所见,这并不意味着该理论将其所描述的状态和过程必然描述为意向性的。计算状态和过程在独立于环境考虑时通常没有真正的意向性描述。因此,意向性表征不是计算理论的个体装置的一部分。从这个意义上说,(C) 是错误的。然而,这个论证确实包含一个重要的见解:需要意向性表征来将计算理论与其前理论的解释项联系起来。只有当理论以形式术语表征的状态被解释为远端属性的表示时,才能解释视觉系统如何从二维图像中包含的信息中检测场景的深度。
The premise of the argument is true—the questions that define a psychological theory’s explanatory domain are usually couched in intentional terms—but, as we have seen, it does not follow that the theory characterizes the states and processes it describes as necessarily intentional. Computational states and processes will typically have no true intentional description when considered independently of an environment. Intentional characterizations are therefore not part of the individuative apparatus of computational theories. In this sense, (C) is false. Yet the argument does contain an important insight: an intentional characterization is needed to connect a computational theory with its pretheoretic explananda. An explanation of how the visual system detects the depth of the scene from information contained in two-dimensional images is forthcoming only when the states characterized in formal terms by the theory are construed as representations of distal properties.
但是,有人可能会反驳说,计算过程的意向性解释所起的关键解释作用难道还不足以使计算理论具有意向性吗?事实上,当计算理论脱离了确保其解释相关性的意向性解释时,似乎不能正确地将其描述为认知理论。从某种意义上说,这是正确的;然而,这并不会破坏我的观点,即计算理论不是意向性的。让我解释一下。
But, one might object, isn’t this crucial explanatory role played by an intentional interpretation of a computational process enough to make the computational theory intentional? Indeed, it might seem that a computational theory, when divorced from the intentional interpretation that secures its explanatory relevance, cannot properly be characterized as a theory of cognition. There is a sense in which this is true; however, it does not undermine my point that computational theories are not intentional. Let me explain.
计算理论为机制计算的函数提供了数学特征,但只有在某些环境中,该函数才能被描述为认知函数(即,其参数和值在认识论上相关的函数,使得计算的输出在给定输入的情况下可以被视为合理的或令人信服的)。举个例子可以更清楚地说明这一点。根据 Marr 的说法,从双目视差计算深度所必需的立体图像匹配得益于我们世界的一个基本事实——视差变化平稳,因为物质具有凝聚力。这是 Marr 所说的自然约束(特别是连续性约束)的一个例子。在某些环境中,无法满足机制计算的数学函数的认知解释约束。在不满足连续性约束的环境中(尖峰宇宙),立体视觉模块将计算相同的正式特征函数,但不会从视差计算深度。该函数可能没有认知(即理性)在这个环境中描述。计算理论脱离实际环境,因为它旨在提供一种抽象的、因而是完全通用的机制描述,该描述为预测和解释其在任何环境中的行为提供了基础,即使在设备所做的事情不能被舒适地描述为认知的环境中也是如此。当计算特性伴随着适当的有意解释时,我们可以看到,在特定的环境中,计算特定数学函数的机制如何能够促进视觉等认知功能。
A computational theory provides a mathematical characterization of the function computed by a mechanism, but only in some environments can this function be characterized as a cognitive function (that is, a function whose arguments and values are epistemically related, such that the outputs of the computation can be seen as rational or cogent given the inputs). An example will make the point clearer. The matching of stereo images essential to the computation of depth from binocular disparity is aided, according to Marr, by a fundamental fact about our world—that disparity varies smoothly, because matter is cohesive. This is an example of what Marr calls a natural constraint (in particular, the continuity constraint). In some environments, the constraints that enable a cognitive interpretation of the mathematical function computed by a mechanism will not be satisfied. In environments where the continuity constraint is not satisfied (a spiky universe), the stereopsis module would compute the same formally characterized function, but it would not be computing depth from disparity. The function might have no cognitive (i.e., rational) description in this environment. A computational theory prescinds from the actual environment because it aims to provide an abstract, and hence completely general, description of a mechanism that affords a basis for predicting and explaining its behavior in any environment, even in environments where what the device is doing cannot comfortably be described as cognition. When the computational characterization is accompanied by an appropriate intentional interpretation, we can see how a mechanism that computes a particular mathematical function can, in a particular context, subserve a cognitive function such as vision.
计算理论通过将具有该能力的机制纳入抽象的计算描述来解释认知能力。通过参考具有独立理论特征的一类设备来解释理论前可识别的能力是一种在其他领域(尤其是生物学)中常见的解释策略。沙鲨探测猎物的能力是通过在鲨鱼体内假设一个电场探测器来解释的,该探测器是一种结构和行为以电磁理论为特征的设备。电磁理论在鲨鱼猎物探测能力的生物学解释中发挥了大部分解释作用。当然,这种解释也诉诸其他事实——例如,鲨鱼自然环境中的动物(而不是岩石和其他无生命物体)会产生显著的电场——但没有人会认为这些事实是电磁理论的一部分。同样,通过指定机制所属的计算设备类别并提供该类别行为的独立(即非认知)特征,计算理论在解释认知能力方面承担了主要的解释负担。对该过程的有意解释也起着解释作用——它表明能力已经得到解释——但在认知解释中发挥重要作用并不意味着它就成为计算理论本身的一部分。
A computational theory explains a cognitive capacity by subsuming the mechanism that has that capacity under an abstract computational description. Explaining a pretheoretically identifiable capacity by reference to a class of devices that have an independent, theoretical, characterization is an explanatory strategy familiar from other domains, particularly biology. The ability of sand sharks to detect prey is explained by positing within the shark the existence of an electric field detector a device whose architecture and behavior is characterized by electromagnetic theory. Electromagnetic theory does most of the explanatory work in the biological explanation of the shark’s prey detecting capacity. Of course, the explanation appeals to other facts—for example, that animals, but not rocks and other inanimate objects in the shark’s natural environment, produce significant electric fields—but no one would suggest that such facts are part of electromagnetic theory. Similarly, by specifying the class of computational devices to which a mechanism belongs and providing an independent (i.e., noncognitive) characterization of the behavior of this class, a computational theory bears the primary explanatory burden in the explanation of a cognitive capacity. The intentional interpretation of the process also plays an explanatory role—it demonstrates that the capacity has been explained—but playing an essential role in the cognitive explanation does not thereby make it part of the computational theory proper.
那么,计算理论不是认知理论吗?这要视情况而定。如果一个理论必须给出机制的认知特征(根据这一特征,计算认知功能是该机制的必要属性)才能成为认知理论,那么计算理论就不是认知理论。如果承担解释认知能力的主要解释负担就足够了,那么它们通常就是认知理论。
So does it follow that computational theories are not cognitive? It depends. If a theory must give a cognitive characterization of a mechanism (according to which computing a cognitive function is a necessary property of the mechanism) to be a cognitive theory, then computational theories are not cognitive. If bearing the primary explanatory burden in an explanation of a cognitive capacity is sufficient, then they typically are.
让我们简要回顾一下威尔逊的理论适当性条件,即科学解释在解释项和被解释项中以相同的水平描述现象。上述计算解释的说明表明,理论适当性并不是对科学解释的一般约束。计算解释以非故意、形式的方式描述认知能力。在任何情况下,这一要求都是不合理的,因为它不仅排除了先前表征现象的还原性解释(例如,微观还原),而且还排除了功能分析的解释,这是认知心理学和生物学中的主要解释形式。11 此类解释通常将复杂的能力或过程分析为更基本的、不太专业的元素。例如,对进行长除法的能力的解释诉诸于复制数字和执行乘法和减法的能力。对消化的解释诉诸于更基本的化学过程。这两种解释似乎都违反了威尔逊的理论适当性条件。尽管威尔逊承认理论的适当性是对科学解释的一个可废止的限制,但这种解释的普遍性表明它根本不是一个限制。
Let us return briefly to Wilson’s theoretical appropriateness condition, the requirement that a scientific explanation characterize a phenomenon at the same level in the explanans as in the explanadum. The above account of computational explanation suggests that theoretical appropriateness is not a general constraint on scientific explanation. Computational explanations characterize cognitive capacities in nonintentional, formal, terms. The requirement is independently implausible in any case, since it would rule out not only reductive explanations (e.g., microreductions) of antecedently characterized phenomena, but also explanation by functional analysis, the predominant form of explanation in both cognitive psychology and biology.11 Such explanations typically analyze complex capacities or processes into more basic, less specialized, elements. For example, the explanation of the capacity to do long division appeals to the ability to copy numerals and perform multiplication and subtraction. The explanation of digestion appeals to more basic chemical processes. Both of these explanations appear to violate Wilson’s theoretical appropriateness condition. Though Wilson grants that theoretical appropriateness is a defeasible constraint on scientific explanation, the ubiquity of explanations of this sort suggests that it is not a constraint at all.
计算系统的解释由解释函数 f I给出,该函数指定系统的假定结构与某个表示域的元素之间的映射。例如,要将设备解释为加法器,需要指定解释函数f I ,该函数将设备的状态与数字配对。只有当存在一个解释函数,能够以相当直接的方式将形式表征的结构映射到这些元素时,才可以说该设备表示域中的元素。
An interpretation of a computational system is given by an interpretation function fI that specifies a mapping between the postulated structures of the system and elements of some represented domain. For example, to interpret a device as an adder involves specifying an interpretation function fI that pairs states of the device with numbers. The device can plausibly be said to represent elements in the domain only if there exists an interpretation function that maps formally characterized structures to these elements in a fairly direct way.
由于解释只是形式化特征元素与某个表示域元素之间的结构保留映射,因此没有理由认为计算系统的解释是唯一的。计算解释的非唯一性一直被认为是计算主义的问题,但事实上并非如此。大多数“非预期”解释都不符合直接性要求。12更重要的是,计算解释的合理性仅取决于是否存在具有解释作用的解释。13
Since an interpretation is just a structure-preserving mapping between formally characterized elements and elements of some represented domain, there is no reason to think that the interpretation of a computational system will be unique. The non-uniqueness of computational interpretation has been thought to be a problem for computationalism, but in fact it is not. Most “unintended” interpretations will not meet the directness requirement.12 More importantly, the plausibility of a computational account depends only on the existence of an interpretation that does explanatory work.13
如果上述对内容解释作用的说明是正确的,那么对计算系统的解释应该将理论的形式装置与其前理论的解释联系起来。这一要求将限制对适当解释函数的选择。一个旨在解释我们算术能力的计算理论,除非它所假设的一些状态可以解释为代表数字,否则它就不能合理地声称自己已经做到了这一点。该系统也可以被解释为绘制六日战争的进程(以乔治·雷伊为例),这一事实不会破坏理论家声称已经描述了一个算术系统的说法,假设该机制可以一致且直接地解释为计算适当的算术函数。鉴于上一节中描述的有意解释的解释作用,计算系统“非预期”解释的存在无关紧要。理论的预先存在的解释为内容的归属设定了条件。
If the above account of the explanatory role of content is correct, then the interpretation of a computational system should connect the formal apparatus of the theory with its pretheoretic explananda. This requirement will constrain the choice of an appropriate interpretation function. A computational theory that purports to explain our arithmetical abilities cannot plausibly claim to have done so unless some of the states it postulates are interpretable as representing numbers. The fact that the system could also be interpreted as charting the progress of the Six-Day War (to use an example of Georges Rey’s) would not undermine the theorist’s claim to have described an arithmetical system, assuming that the mechanism can be consistently and directly interpreted as computing the appropriate arithmetical functions. Given the explanatory role of intentional interpretation as characterized in the previous section, the existence of “unintended” interpretations of computational systems is irrelevant. The preexisting explananda of the theory set the terms for the ascription of content.
考虑一下这对于旨在解释我们感知能力的理论意味着什么。定义感知理论领域的认知任务通常是根据恢复有关主体正常环境的某些类型的信息来指定的。将系统状态解释为代表环境特定属性表明该理论解释了主体如何能够在其正常环境中恢复这些信息。因此,我们应该期望归因于计算特征感知状态的内容是广泛的,也就是说,在明显不同的环境中,物理上相同的主体不会共享这些内容。
Consider what this means for theories that purport to explain our perceptual capacities. The cognitive tasks that define the domains of theories of perception are typically specified in terms of the recovery of certain types of information about the subject’s normal environment. Interpreting states of the system as representing environment-specific properties demonstrates that the theory explains how the subject is able to recover this information in its normal environment. Consequently, we should expect the contents ascribed to computationally characterized perceptual states to be broad, that is, not shared by physically identical subjects in significantly different environments.
Fodor(例如 1980、1984、1987)和其他人(例如 Block,1986;和 Cummins,1989)认为,计算心理学必须将自己限制在狭义内容的概念上,即附加于主体内在物理状态的内容。14这种观点的动机部分在于认识到计算分类学与主体的正常环境无关。物理重复就是计算重复。鉴于这一事实,如果计算状态本质上具有其语义属性,那么计算心理学就需要一种附加于系统物理属性的内容概念;换句话说,它需要一种狭义内容的概念。但是,如果正如我所说,计算状态仅非本质上具有其语义属性,那么狭义内容就没有必要。事实证明,计算心理学不应将自己限制在狭义内容上有充分的理由。
It has been argued by Fodor (e.g., 1980, 1984, 1987) and others (e.g., Block, 1986; and Cummins, 1989) that computational psychology must restrict itself to a notion of narrow content, that is, content that supervenes on intrinsic physical states of the subject.14 In part, the motivation for such a view is the recognition that computational taxonomy prescinds from the subject’s normal environment. Physical duplicates are computational duplicates. Given this fact, if computational states have their semantic properties essentially, then computational psychology requires a notion of content that supervenes on the physical properties of the system; in other words, it needs a notion of narrow content. But if, as I have argued, computational states have their semantic properties only nonessentially, then narrow content is not necessary. And it turns out that there are good reasons why computational psychology should not restrict itself to narrow content.
首先,狭义内容这一有用概念的指定难度众所周知。更重要的是,由于感知理论的解释项通常以特定于环境的术语来表述,因此普通的特定于环境的宽泛内容将最能满足此类理论的解释目标。这一点可以概括。人们普遍认为普通内容是宽泛的。只要计算理论的理论前解释项以普通术语来表达,将宽泛内容归因于计算状态和结构将是合适的。
In the first place, a useful notion of narrow content has been notoriously hard to specify. More importantly, since the explananda of theories of perception are typically formulated in environment-specific terms, ordinary environment-specific broad contents will best serve the explanatory goals of such theories. The point can be generalized. It is widely appreciated that ordinary contents are broad. Insofar as the pretheoretic explananda of computational theories are framed in ordinary terms, the ascription of broad content to computational states and structures will be appropriate.
仔细研究 Marr 的理论可以证实这一点。他尽可能地归因于广泛的、特定于环境的内容。如果在主体的正常环境中,结构与显着的远端属性可靠地相关,那么 Marr 会将该结构描述为代表该属性。(例如,他将 2.5D 草图中的结构描述为代表表面方向。)Marr 理论提出的一些结构与主体正常环境中没有简单的远端属性标记相关。Marr 称之为边缘的结构有时与表面方向的变化相关,有时与深度、照明或反射率的变化相关。Marr 将边缘描述为代表这种分离的远端属性。请注意,在这两种情况下 - 与主体正常环境中的简单远端属性相关或与主体正常环境中的分离远端属性相关 - 归因于表征结构的内容都是广泛的。此外,如此归因的广泛内容由主体正常环境中获得的相关性决定,而不是由其他环境中获得的相关性决定。
A close look at Marr’s theory confirms the point. He ascribes broad, environment-specific contents where possible. If in a subject’s normal environment a structure is reliably correlated with a salient distal property, then Marr describes the structure as representing that property. (For example, he describes structures in the 2.5D. sketch as representing surface orientation.) Some of the structures posited by Marr’s theory correlate with no simple distal property tokening in the subject’s normal environment. The structures that Marr calls edges sometimes correlate with changes in surface orientation, sometimes with changes in depth, illumination, or reflectance. Marr describes edges as representing this disjunctive distal property. Notice that in both cases—correlation with a simple distal property in the subject’s normal environment or correlation with a disjunctive distal property in the subject’s normal environment—the contents ascribed to the representational structures are broad. Moreover, the broad contents so ascribed are determined by the correlations that obtain in the subject’s normal environment, not by those that would obtain in some other environment.
然而,Marr 所假设的某些结构(例如,单个零交叉点)与主体正常环境中任何易于表征的远端属性(简单或分离)都不相关。它们的一些标记与远端属性相关,其他标记似乎只是成像过程的产物。Marr 认识到这一事实,并警告说,诸如零交叉点之类的结构并非“具有物理意义”;他将它们描述为表示图像中的不连续性。它们的内容只是近端的,因此是狭隘的——它们依赖于主体的内在属性。但这种近端或狭隘的内容远非 Marr 的选择内容,而是他的最后手段,因为只有在无法进行广泛的内容归属时,他才会归因于近端内容。15
Some of the structures that Marr posits (e.g., individual zero-crossings) do not, however, correlate with any easily characterized distal property, simple or disjunctive, in the subject’s normal environment. Some of their tokenings correlate with distal properties, others appear to be mere artifacts of the imaging process. Marr recognizes this fact, cautioning that such structures as zero-crossings are not “physically meaningful”; he describes them as representing discontinuities in the image. Their contents are only proximal, and hence narrow—they supervene on the intrinsic properties of the subject. But such proximal or narrow content, far from being Marr’s content of choice, is his content of last resort, since he ascribes proximal content only when a broad content ascription is unavailable.15
内容的共变理论(或信息论理论)将表征状态的意义与特定情况下状态标记的原因联系起来。16前述马尔理论中的内容归属说明可能会诱使一些人认为他的理论默认了内容的共变理论。这将是一个错误。我曾声称,在归因内容时,马尔会在主体的正常环境中寻找结构标记的显著远端相关物。我一直小心翼翼地避免声称这些相关物是结构标记的原因。虽然说它们是结构标记的原因可能很自然,但马尔并没有这样说,而且不这样做可以避免许多众所周知的问题。17应该清楚的是,如果考虑找不到结构标记的显著远端相关物(简单或分离)的情况,马尔的理论并不致力于内容的共变理论。在这种情况下,Marr 会将近端内容归因于结构,将其解释为代表图像或输入表示的特征,而不是其标记的远端原因,无论它是什么。近端内容的归因具有重要的解释目的——它使设备的计算说明更加清晰,因为它允许我们跟踪设备在处理过程中所做的事情,而理论假设的结构与显着的远端属性没有完全关联。18对这些结构进行不明确的远端解释不会有任何解释目的;因此,Marr 不会将它们解释为代表其远端原因。决定采用近端解释而不是远端解释完全是出于解释考虑。19
Covariational (or information-theoretic) theories of content identify the meaning of a representational state with the cause of the state’s tokening in certain specifiable circumstances.16 The foregoing account of content ascription in Marr’s theory may tempt some to find in his theory a tacit endorsement of a covariational theory of content. This would be a mistake. I have claimed that in ascribing content Marr looks for salient distal correlates of a structure’s tokening in the subject’s normal environment. I have been careful to avoid claiming that these correlates are the cause of the structure’s tokening. Though it may be natural to say that they are, Marr makes no such claim, and a number of well-known problems are avoided by not doing so.17 It should be clear that Marr’s theory is not committed to a covariational theory of content if one considers the sort of case where no salient distal correlate (simple or disjunctive) of a structure’s tokening can be found. In such cases, Marr ascribes a proximal content to the structure, interpreting it as representing a feature of the image or input representation rather than the distal cause of its tokening, whatever that might be. The ascription of proximal content serves an important expository purpose—it makes the computational account of the device more perspicuous, by allowing us to keep track of what the device is doing at points in the processing where the theory posits structures that do not correlate neatly with a salient distal property.18 No explanatory purpose would be served by an unperspicuous distal interpretation of these structures; consequently, Marr does not interpret them as representing their distal causes. The decision to adopt a proximal rather than a distal interpretation is dictated by purely explanatory considerations.19
计算描述和广泛内容的兼容性似乎在文献中被忽视了。Cummins (1989) 对计算心理学中的表征的解释与我的有些相似,他表示:
The compatibility of computational description and broad content seems to have gone unnoticed in the literature. Cummins (1989), whose account of representation in computational psychology bears some resemblance to mine, says the following:
CTC [计算认知理论] ……寻求一种个体主义心理学,即一种专注于认知能力的心理学,这种认知能力可能对截然不同的环境产生影响。如果反个人主义的意向性立场是正确的(即如果信念和欲望不能以独立于环境的方式指定),那么个体主义心理学的解释项就不能被有意地指定。因此,CTC 不应该——事实上,不应该——关注有意指定的解释项。(140)
The CTC [computational theory of cognition]…seeks an individualist psychology, i.e., a psychology that focuses on cognitive capacities of the kind that might be brought to bear on radically different environments. If the anti-individualist position with regard to intentionality is right (i.e. if beliefs and desires cannot be specified in a way that is independent of environment), then the explananda of an individualist psychology cannot be specified intentionally. It follows that the CTC shouldn’t—indeed, musn’t—concern itself with intentionally specified explananda. (140)
卡明斯的错误在于,他认为计算理论试图提供认知过程的非意图、环境无关的特征,这意味着它无法解释用环境特定术语指定的现象。我们已经看到,这是错误的。计算理论通过将环境特定的认知能力纳入环境无关的特征来解释它。对过程的意图解释是理论提供的抽象特征与构成理论解释项的环境特定意图特征之间的桥梁。正因为意图解释在理论中并不发挥本质上的个体作用——换句话说,无论计算状态具有什么内容,它们都是非本质的——理论家可以自由地在适当的地方分配广泛的内容,以确保理论和解释项之间的联系。
Cummins’s mistake is in thinking that the fact that a computational theory seeks to provide a nonintentional, environment-independent characterization of a cognitive process entails that it cannot explain phenomena specified in environment-specific terms. This, we have seen, is wrong. A computational theory explains an environment-specific cognitive capacity by subsuming it under an environment-independent characterization. The intentional interpretation of the process serves as a bridge between the abstract characterization provided by the theory and the environment-specific intentional characterization that constitutes the theory’s explananda. Precisely because the intentional interpretation does not play an essentially individuative role in the theory—in other words, whatever contents computational states have, they have them non-essentially—the theorist is free to assign broad contents where appropriate to secure the connection between theory and explananda.
计算理论家能够并且通常会将广泛的内容分配给计算结构,这一事实对心理学具有更大的影响。在《方法论唯我论被视为认知心理学的一种研究策略》中,福多尔说:
The fact that the computational theorist can and typically will assign broad contents to computational structures has larger implications for psychology. In “Methodological Solipsism Considered as a Research Strategy in Cognitive Psychology,” Fodor says the following:
计算心理学(被视为一种定义在心理表征之上的形式过程理论)和自然主义心理学(被视为一种表征与世界之间(可能存在因果关系)关系的理论,它确定了前者的语义解释)都有存在的空间。我认为原则上这是看待事物的正确方式……然而……计算心理学极有可能是我们唯一可能得到的心理学。[A]自然主义心理学不是一种实用的可能性,也不太可能成为一种实用的可能性。(1980,66)
there is room for both a computational psychology—viewed as a theory of formal processes defined over mental representations—and a naturalistic psychology, viewed as a theory of the (presumably causal) relations between representations and the world which fix the semantic interpretations of the former. I think that in principle this is the right way to look at things…however…it’s overwhelmingly likely that computational psychology is the only one that we are likely to get. [A] naturalistic psychology isn’t a practical possibility and isn’t likely to become one. (1980, 66)
福多尔认为,自然主义心理学是有机体/环境关系的理论,它确定了我们心理术语的含义。他提出了两个论据来支持自然主义心理学不可能存在的说法。由于这两个论据在文献中都得到了彻底的阐述(参见福多尔 1980 年的评论),我在这里就不多说了。但据我所知,没有人质疑福多尔的暗示,即计算心理学和自然主义心理学是完全不相关的项目。如果我对内容在计算心理学中的作用的解释是正确的,那么福多尔的构想方式就是错误的。如果我们有一个完整的计算心理学,即对每个人类认知能力的计算解释,那么我们实际上就已经有一个自然主义心理学了。让我来详细阐述一下。
Naturalistic psychology, as Fodor construes it, is the theory of organism/environment relations that fix the meanings of our mental terms. He offers two arguments for the claim that a naturalistic psychology is impossible. As both arguments have been thoroughly worked over in the literature (see the commentaries that accompany Fodor, 1980), I won’t go into them here. But as far as I know, no one has disputed Fodor’s implication that computational psychology and naturalistic psychology are entirely unrelated projects. If my account of the role of content in computational psychology is correct, then Fodor’s way of conceiving things is wrong. If we had a complete computational psychology, that is, a computational account of each human cognitive capacity, we would ipso facto already have a naturalistic psychology. Let me elaborate.
尽管计算理论提供了与环境无关的正式过程特征,但理论家通常无法在不调查主体的正常环境的情况下发现正确的正式特征。通常,在指定认知机制计算的函数时,必要的第一步是发现使计算易于处理的环境约束。信息处理问题的解决方案通常由机制输入中包含的信息决定;解决方案只有在反映主体正常环境的非常普遍特征的附加信息的帮助下才能实现。例如,如前所述,从双眼视差计算深度是可能的,因为该机制的构建是为了假设其正常环境的真实情况——视差由于物质具有内聚性(连续性约束)而平稳变化。找到这种非常普遍的约束是描述机制必须解决的数学问题,从而得出正确的计算过程描述的必要第一步。
Although a computational theory provides a formal, environment-independent, characterization of a process, the theorist will usually be unable to discover the correct formal characterization without investigating the subject’s normal environment. Typically, a necessary first step in specifying the function computed by a cognitive mechanism is discovering environmental constraints that make the computation tractable. The solutions to information-processing problems are often underdetermined by information contained in the input to the mechanism; the solution is achieved only with the help of additional information reflecting very general features of the subject’s normal environment. For example, as previously mentioned, the computation of depth from binocular disparity is possible only because the mechanism is built to assume something that is true about its normal environment—that disparity varies smoothly because matter is cohesive (the continuity constraint). Finding constraints of this very general sort is a necessary first step in characterizing the mathematical problem that the mechanism has to solve, and thus in arriving at a correct computational description of the process.
计算理论家还有第二个更明显的作用,即对确定心理术语含义的有机体/环境相互作用做出具体说明,即对形式化特征化过程的有意解释做出具体说明。我曾指出,内容归属受到主体正常环境的限制。将内容归因于理论所假设的结构的过程涉及尝试指定这些结构标记的正常环境相关性。马尔成功地将远端内容归因于他的理论中假设的许多结构(马尔并不是唯一取得这一成就的人),这一事实表明自然主义心理学并非不可能。尽管计算心理学是形式化的——其分类学原理是形式化的——但它的发展与福多尔称之为自然主义心理学的项目密切相关。20
There is a second and more obvious point at which the computational theorist will contribute to the specification of the organism/environment interactions that fix the meanings of mental terms—namely, in the specification of an intentional interpretation of a formally characterized process. I have argued that content ascription is constrained by the subject’s normal environment. The process of ascribing content to the structures posited by the theory involves the attempt to specify the normal environmental correlates of tokenings of these structures. The fact that Marr succeeded in ascribing distal contents to many of the structures posited in his theory (and Marr is not unique in this achievement) demonstrates that naturalistic psychology is not impossible. Although computational psychology is formal—its taxonomic principles are formal— it develops hand-in-glove with the project that Fodor calls naturalistic psychology.20
我对内容解释作用的论述已通过参考经典计算模型进行了阐述和辩护。经典架构将认知过程视为规则控制的内部符号或数据结构的操作,这些符号或数据结构是明确的解释候选对象。相比之下,联结主义认知模型不假设定义设备操作的数据结构。(许多联结主义设备与经典设备不同,不能正确地描述为构建、存储和检索内部表示。)联结主义模型假设激活单元(节点)会增加或减少与其连接的其他单元的激活水平,直到整体稳定下来。因此,联结主义模型缺乏方便的“钩子”,无法将对过程的有意解释挂在其上。语义解释将内容归因于网络中的单个单元或一组单元的激活模式。但是,我认为上述对内容解释作用的论述没有理由不直接适用于联结主义系统。联结主义网络的语义解释与经典计算模型的解释一样,发挥着同样复杂的解释作用。最重要的是,它们将认知能力的联结主义理论与其前理论解释联系起来。
My account of the explanatory role of content has been articulated and defended by reference to classical computational models. Classical architectures treat cognitive processes as rule-governed manipulations of internal symbols or data structures that are explicit candidates for interpretation. Connectionist cognitive models, by contrast, do not posit data structures over which the device’s operations are defined. (Many connectionist devices, unlike classical devices, are not correctly described as constructing, storing, and retrieving internal representations.) Connectionist models posit activated units (nodes) that increase or decrease the level of activation of other units to which they are connected until the ensemble settles into a stable configuration. Consequently, connectionist models lack convenient “hooks” on which an intentional interpretation of a process may be hung. Semantic interpretations ascribe content either to individual units in the network or to patterns of activation over an ensemble of units. However, I see no reason why the above account of the explanatory role of content would not apply straightforwardly to connectionist systems. Semantic interpretations of connectionist networks play the same complex explanatory role as do interpretations of classical computational models. Most importantly, they connect a connectionist theory of a cognitive capacity with its pretheoretic explananda.
计算心理学是否会对意向状态(即信念和欲望)的典型案例提供很多启示,还有待观察。计算主义的显著成功在于表征高度模块化、信息封装的过程,例如早期视觉、句法和语音处理。这类理论所提出的状态未能表现出命题态度所特有的复杂功能作用(通常包括意识可及性)。它们是次信念状态。福多尔在《心灵的模块化》一书中对以正式的、计算的术语表征更为核心的认知过程(如信念固着)的前景表示了相当大的悲观态度(见本卷第 9 章)。我认为这种悲观态度是有道理的,因为信念归因的上下文敏感性使编程任务看起来难以处理。然而,如果命题态度的计算说明即将出现,内容将发挥与模块化能力理论相同的解释作用。这种可能性的一个有趣后果是,如此表征的命题态度本质上不会具有其内容。如果在相对不同的环境中标记类型相同的信念状态标记,则它们可能具有不同的内容。21因此,计算信念理论的前景挑战了正统心灵哲学的基本承诺。22有些人可能会得出结论,这种理论实际上与命题态度无关,尽管民间对心灵的观念似乎没有任何内容可以证明这一结论。
It remains to be seen whether computational psychology will shed much light on paradigm cases of intentional states, namely, beliefs and desires. The conspicuous successes of computationalism have been in characterizing highly modularized, informationally encapsulated processes such as early vision and syntactic and phonological processing. The states posited by theories of this sort fail to exhibit the complex functional roles characteristic of the propositional attitudes (including, typically, accessibility to consciousness). They are subdoxastic states. Fodor, in The Modularity of Mind has expressed considerable pessimism about the prospects of characterizing in formal, computational terms more central cognitive processes such as belief fixation (see Chapter 9, this volume). I think this pessimism is well placed, if only because the context-sensitivity of belief ascription makes the programming task appear intractable. However, should a computational account of propositional attitudes be forthcoming, content would play the same explanatory role it plays in theories of modular capacities. An interesting consequence of this eventuality is that propositional attitudes, so characterized, would not have their contents essentially. Type-identical belief-state tokens might have different contents, should they be tokened in relevantly different environments.21 The prospect of a computational theory of belief, therefore, challenges a fundamental commitment of orthodox philosophy of mind.22 Some may conclude that such a theory would not really be about the propositional attitudes, though nothing in the folk conception of the mind would seem to warrant this conclusion.
1.对于类语言的表示,形式条件声称它们根据其语法具有因果角色。
1. For language-like representations, the formality condition claims that they have their causal roles in virtue of their syntax.
2.有关Marr理论的最详细阐述,请参见(Marr,1982)。
2. For the most detailed exposition of Marr’s theory see (Marr, 1982).
3. Loar (1988) 和 Segal (1989, 1991) 的观点可能最为接近。Loar 的提议关注常识,而非计算心理学。Segal 认为,狭义内容在 Marr 的理论中起着核心作用。有关对 Segal 提议的批评,请参阅 (Egan, 1996)。
3. Loar (1988) and Segal (1989, 1991) have perhaps come closest. Loar’s proposal concerns commonsense, as opposed to computational, psychology. Segal argues that narrow content plays a central role in Marr’s theory. See (Egan, 1996) for criticism of Segal’s proposal.
4. Graves 等人 (1973) 的论证似乎暗含着对理论适当性的诉诸,他们认为,要解释说话者对其语言的了解,必须诉诸内化的语法知识。
4. A tacit appeal to theoretical appropriateness seems to underlie the argument of Graves et al. (1973) for the claim that the explanation of the speaker’s knowledge of her language must appeal to internalized knowledge of grammar.
5. Shapiro (1993) 认为我的主张(在 Egan,1991 中)Marr 的理论不是有意的,这“令人吃惊”。它不应该令人吃惊。Field (1978) 的一个核心主张是,正如 Field 在其 (1986) 的论文中所说,“心理学理论有一个非意向性的核心”(114)。无论如何,Marr 的解释者并没有捍卫他的理论是有意的这一关键假设。
5. Shapiro (1993) has described my claim (in Egan, 1991) that Marr’s theory is not intentional as “startling.” It should not be startling. A central claim of Field (1978) is, as Field puts it in his (1986) paper, “that psychological theories have a non-intentional core” (114). In any event, interpreters of Marr have not defended the crucial assumption that his theory is intentional.
6.在将马尔层次结构的各层级描述为描述层级时,我并不是要排除将按层级分类的项目视为现象和过程,而不是理论家可以使用的纯粹语言手段。
6. In describing the levels of Marr’s hierarchy as levels of description I do not mean to preclude treating the items classified by level as phenomena and processes rather than purely linguistic devices available to theorists.
7.在论证计算理论的(各种)有意特征时,马尔的解释者指出,马尔将原始草图说成是“代表图像”,而其他结构代表诸如深度和表面反射率等远端特性。这些论点背后的假设是,马尔在这些段落中的话对于解决分类学问题具有决定性作用。如果理论解释如此简单,那么大部分科学哲学就会失效。科学理论的个体原则很少能从表达该理论的语言中读出。马尔的语言通常不谨慎或不一致。他没有理由这样做——他没有关注哲学家们关心的问题。然而,在我引用的文中段落中,马尔明确讨论了信息处理方法的基本承诺,特别是如何理解计算理论;因此这段话在当前问题的背景下具有特殊意义。
7. In arguing for (various) intentional characterizations of the theory of the computation, interpreters of Marr point out that he speaks of the primal sketch as “representing the image,” and of other structures as representing such distal properties as depth and surface reflectance. The assumption underlying such arguments is that Marr’s words in these passages are decisive for settling issues of taxonomy. If theory interpretation were so simple, much of the philosophy of science would be out of business. The individuative principles of a scientific theory can rarely be read off the language used to articulate the theory. Marr is not generally careful or consistent in his language. There is no reason why he should be—he is not focusing on the issues that have concerned philosophers. In the passage I have quoted in the text, however, Marr is explicitly discussing fundamental commitments of the information-processing approach, in particular, how the theory of the computation is to be understood; so the passage bears special significance in the context of the current issue.
8. Colin McGinn 曾向我指出,计算理论在以下意义上是有意的:它确实指定了对计算过程的预期解释——预期解释是数学的。计算理论的最高层将系统描述为计算一系列基于数学实体定义的函数。我很高兴地说,计算理论在这种相当不寻常的意义上是有意的。这是 Marr 的解释者认为他的理论是有意的。(他们认为该理论本质上将系统描述为计算基于视觉领域各个方面定义的函数,而这正是我所否认的。)
8. Colin McGinn has pointed out to me that the theory of the computation is intentional in the following sense: it does specify an intended interpretation of a computational process—the intended interpretation is mathematical. The topmost level of a computational theory characterizes the system as computing a series of functions defined on mathematical entities. I am quite happy to say that a computational theory is intentional in this rather unusual sense. This is trot the sense in which interpreters of Marr have taken his theory to be intentional. (They have assumed that the theory characterizes the system, essentially, as computing a function defined on aspects of the visual domain, and this is precisely what I deny.)
9 . 这并不是说理论家在试图制定设备的计算描述时可以忽略主体的环境。恰恰相反。参见第 14.5 节。
9. This is not to suggest that the theorist can ignore the subject’s environment in attempting to formulate a computational description of the device. Quite the contrary. See Section 14.5.
10.这是 (Davies, 1991) 中一个例子的改编。
10. This is an adaption of an example from (Davies, 1991).
11.有关泛函分析的说明,请参阅 Cummins (1983),第 2 章。
11. See Cummins (1983), chap. 2, for an account of functional analysis.
12.直接性要求排除了将桌子解释为加法器的可能性,因为将数字分配给桌子的状态需要解释器自己计算加法函数。系统没有做这项工作。直接性要求尚未明确规定,但请参阅 Cummins (1989),第 8 章的讨论。我在这里略过这个问题,主要是因为,正如我们将在下面看到的,排除计算系统的非预期解释的“问题”通常不会出现。
12. The directness requirement precludes interpreting a desk as an adder, since the assignment of numbers to states of the desk requires the interpreter to compute the addition function herself. The system is not doing the work. The directness requirement has yet to be precisely specified, but see Cummins (1989), chap. 8 for discussion. I gloss over this issue here primarily because, as we shall see below, the “problem” of ruling out unintended interpretations of computational systems typically does not arise.
13.存在多个满足直接性要求的解释仅仅表明,形式化特征化的设备能够计算多个认知功能。 上面描述的 visex 如果以不同的方式嵌入到生物体中,它将计算听觉领域的功能。
13. The existence of more than one interpretation meeting the directness requirement simply shows that the formally characterized device is capable of computing more than one cognitive function. The visex, described above, would compute a function on the auditory domain if it were embedded differently in the organism.
14.其他人,例如 Stich (1983),对内容归因通常与情境相关且与观察者相关的事实印象深刻,因此得出结论,认知心理学根本不应该关注内容。
14. Others, such as Stich (1983), impressed by the fact that content ascription is typically context-sensitive and observer-relative, have concluded that cognitive psychology should not advert to content at all.
15 . 那些认为狭义内容是 Marr 所选择内容的评论者之所以这样做,大概是因为他们认识到,决定内容的相关性与远端属性在不同环境中可能会有很大差异(例如,参见 Segal,1989 年、1991 年)。他们没有注意到,对于 Marr 来说,相关相关性是主体在正常环境中获得的相关性。
15. Commentators who have thought narrow content to be Marr’s content of choice have presumably done so because they recognize that content-determining correlations with distal properties can vary wildly across environments (see, for example, Segal, 1989, 1991). They fail to notice that for Marr the relevant correlations are those that obtain in the subject’s normal environment.
16 . 例如,参见 Stampe (1977)、Dretske (1981) 和 Fodor (1990)。当然,他们的论述存在重大差异。
16. See, for example, Stampe (1977), Dretske (1981), and Fodor (1990). There are, of course, important differences in their accounts.
17.内容的共变理论的一个问题是,它们暗示符号的意义是由其所有潜在原因的分离而产生的。由于“马”标记不仅可能由马引起,也可能由看起来像马的牛引起,因此共变理论似乎暗示“马”是指马或看起来像马的牛。(有关讨论,请参阅 Fodor,1990。)“分离问题”引起了另一个困难,即,鉴于符号标记的所有潜在原因决定了其意义,如何解释符号错误表示其对象的可能性。虽然计算理论家对错误表示几乎没有什么可说的(他们关心的是描述在正常情况下发生的事情,即感知是真实的),但不难看出错误表示是如何根据我上面概述的内容归属而产生的。如果在标记的正常环境条件不满足的情况下对分配了远端内容(简单或分离)的结构进行标记,则它们将错误表示。假设作为军事训练演习的一部分,比尔(一个具有 Marrian 视觉系统的正常人)被安置在一个房间里,其中连续性约束(该约束认为由于物质具有凝聚力,视差会平稳变化)未得到满足。比尔的视觉系统通常根据视差信息计算深度。然而,在这种情况下,物质的尖峰向四面八方投射,比尔(或者更具体地说,比尔视觉系统的立体视觉模块)将计算与通常相同的形式表征函数,但他会将其他属性(不是我们有一个方便名称的属性)错误地表示为深度。一般来说,如果不满足通常使生物体能够计算认知功能的约束,它将无法表示其环境。
17. One problem with covariational theories of content is their implication that the meaning of a symbol is given by the disjunction of all of its potential causes. Since “horse” tokenings would be caused not only by horses, but also by horsey looking cows, covariational theories seem to imply that “horse” means horse or horsey looking cow. (See Fodor, 1990 for discussion.) The “disjunction problem” gives rise to a further difficulty, namely, how to account for the possibility of a symbol’s misrepresenting its object, given that all potential causes of a symbol’s tokening determine its meaning. Though computational theorists have had little to say about misrepresentation (their concern is to characterize what is going on in the normal case, where perception is veridical) it is not hard to see how misrepresentation can arise on the account of content ascription I have sketched above. Structures assigned distal contents (simple or disjunctive) will misrepresent if they are tokened when the normal environmental conditions for their tokening are not satisfied. Suppose that, as part of a military training exercise, Bill, a normal human with a Marrian visual system, is placed in a room where the continuity constraint, which holds that disparity varies smoothly because matter is cohesive, is not satisfied. Bill’s visual system normally computes depth from disparity information. However, in these circumstances, where spikes of matter project in all directions, Bill (or, more specifically, the stereopsis module of Bill’s visual system) will compute the same formally characterized function as he normally does, but he will misrepresent some other property (not a property for which we have a convenient name) as depth. In general, where the constraints that normally enable an organism to compute a cognitive function are not satisfied, it will fail to represent its environment.
18.人们可能想知道我所说的“近端内容”是否真的是内容。可以肯定的是,近端内容与我们在普通的预测和解释实践中归因的内容没有太大相似之处;然而,我确实认为这种内容在内部过程的计算说明中发挥着真正的解释作用。再举一个例子,Marcus (1980) 将在自然语言理解过程中构建的结构描述解释为表示的不是远端对象(或公共语言句子),而是解析器堆栈或缓冲区中的项目。在视觉和解析情况下,将结构解释为表示在过程中较早构建的其他结构都具有重要的功能,使我们能够跟踪处理器正在做什么。鉴于计算心理学中内容归属的理由主要是解释性的,我认为近端内容应该被视为一种内容,尽管可能只是一种“最小”内容。
18. One might wonder whether what I am calling “proximal content” is really content at all. To be sure, proximal contents do not bear much resemblance to the contents we ascribe in our ordinary predictive and explanatory practices; however, I do think that contents of this sort play a genuine explanatory role in computational accounts of internal processes. To cite a second example, Marcus (1980) interprets the structural descriptions constructed in the course of natural language comprehension as representing not distal objects (or public language sentences) but the items in stacks or buffers of the parser. In both the vision and parsing cases, interpreting a structure as representing other structures constructed earlier in the process serves the important function of allowing us to keep track of what the processor is doing. Given that the rationale for content ascription in computational psychology is primarily explanatory, I think that proximal content should be treated as a species of content, though perhaps only as a sort of “minimal” content.
19。Matthews (1988)、Segal (1989) 和 McGinn (1989) 指出了另一个反对对内容进行完全因果解释的理由。他们认为,心理表征的内容似乎部分取决于它们倾向于产生的行为类型。一个由裂缝和阴影共同引起的结构的标记是否意味着裂缝、阴影,还是裂缝或阴影,部分取决于其标记是否有助于产生适合裂缝、阴影或两者的行为。
19. Matthews (1988), Segal (1989), and McGinn (1989) note another reason to resist an exclusively causal account of content. They argue that the contents of mental representations seem to be partly determined by the sorts of behaviors that they tend to produce. Whether a structure whose tokening is caused by both cracks and shadows means crack, shadow, or crack or shadow depends in part upon whether its tokening contributes to the production of behavior appropriate to cracks, shadows, or both.
20.自然主义心理学被解释为确定心理表征意义的有机体/环境关系的规范,不应与语义学中有时被称为“自然化项目”的事业相混淆。后者试图在非意图和非语义词汇中指定心理状态的意义的充分条件。这是一个纯粹的哲学项目,不是心理学家关心的问题。
20. Naturalistic psychology, construed as the specification of the organism/environment relations that fix the meanings of mental representations, should not be confused with the enterprise that is sometimes called “the naturalization project” in semantics. The latter attempts to specify sufficient conditions, in a nonintentional and nonsemantic vocabulary, for a mental state’s meaning what it does. It is a purely philosophical project, not the concern of psychologists.
21 . 例如,计算信念理论会对我的水信念和我的孪生地球分身对水的信念进行类型识别,尽管适合我们各自世界的有意解释可能会为我们的类型相同的信念赋予不同的广泛内容。因此,计算信念理论会尊重作为狭义内容假设的主要动机的直觉——分身在心理相关方面是相同的,因此应该被纳入相同的心理概括之下。但由于计算理论并不致力于狭义内容,它也可以适应主体的环境是其信念内容的决定因素的直觉。
21. For example, a computational theory of belief would type-identify my water beliefs and my Twin Earth doppelganger’s twater beliefs, although intentional interpretations appropriate to our respective worlds might assign different broad contents to our type-identical beliefs. A computational theory of belief would, therefore, respect the intuition that has been the prime motivation for the postulation of narrow content—that doppelgangers are identical in psychologically relevant respects, and hence should be subsumed under the same psychological generalizations. But because a computational theory is not committed to narrow content, it can also accommodate the intuition that the subject’s environment is a determinant of her belief contents.
22.但请参阅马修斯(Matthews,1994)对命题态度的解释,该解释否认命题态度本质上具有其内容。
22. But see Matthews (1994) for an account of propositional attitudes that denies that they have their contents essentially.
世界是复杂的。智能代理处理这种复杂性的一种方法是构建内部模型——简化的世界表示,同时仍保留有效行动所需的所有信息。本部分的论文介绍了各种建模方法,每种方法都侧重于表示不同类型的结构(分类结构、价值结构、预测结构和因果结构)。每种方法在理论上都卓有成效,既有助于理解人类的工作方式,也有助于构建在世界上有效行动的机器。
The world is complicated. One way that intelligent agents might deal with that complexity is to build internal models—simplified representations of the world that still preserve all of the information necessary for effective action. The papers in this part present a variety of modeling approaches, each focused on representing different kinds of structure (categorical structure, value structure, predictive structure, and causal structure). Each has been theoretically fruitful both for understanding how humans work and for building machines that act effectively in the world.
在第 15 章中,Cameron Buckner 讨论了从低级数据中提取有用抽象类别问题的经验主义方法的历史。他表明,现代深度神经网络可以看作是执行某些类型的转换,可以可靠地解决此问题。对神经网络及其功能的几何理解一直很重要;Buckner 展示了这一传统如何通过现代技术得以延续和扩展。
In chapter 15, Cameron Buckner discusses the history of empiricist approaches to the problem of extracting useful abstract categories from low-level data. He shows that modern deep neural networks can be seen as performing certain kinds of transformations that reliably solve this problem. The geometric understanding of neural networks and what they do has always been important; Buckner shows how this tradition is continued and expanded by modern techniques.
Julia Haas 的《评估思维》(第 16 章)讨论了强化学习模型。这些模型关注的是模拟世界以有效地在其中行动的代理,它们根据对其行动成功的反馈更新策略。Haas 的文章提供了一些见解,说明强化学习如何为在人工系统中构建欲望、价值和意志提供基础。她指出,在最具争议性的情况下,这最终导致了“奖励就足够”假设,该假设表明,所有智能行为最终都可以理解为试图最大化某些内在奖励信号。
Julia Haas’s “The Evaluative Mind” (chapter 16) discusses reinforcement learning models. These focus on agents who model the world to effectively act in it, updating policies in response to feedback about the success of their actions. Haas’s essay provides some insight into how reinforcement learning might provide a a basis for building desire, value, and the will into artificial systems. At its most controversial, she notes, this culminates in the ‘Reward Is Enough’ hypothesis, which suggests that all intelligent behavior can ultimately be understood in terms of trying to maximize some intrinsic reward signal.
在第 17 章中,摘录了安迪·克拉克 (Andy Clark) 的文章《下一步是什么?》中的一段话,他展示了预测编码模型的强大功能,该模型使用纠错反馈来更新世界模型并在其中有效地发挥作用。基于人工智能 (AI) 研究中不断出现的一个观点,克拉克认为大脑不需要构建一个完整的内部世界模型来完整准确地反映周围的世界;相反,大脑会学习如何在预测未来世界的相关状态时最小化错误信号。克拉克展示了这种建模视角的变化如何为一系列感知、认知和运动现象提供描述性和解释性杠杆,包括中枢神经系统疾病和各种幻觉。
In chapter 17, an excerpt from his article “Whatever Next?,” Andy Clark demonstrates the power of predictive coding models, which use error-correction feedback both to update models of the world and act effectively in it. Building on a constant refrain in critical artificial intelligence (AI) research, Clark argues that the brain has no need to construct a complete internal model of the world that reflects completely and accurately the world around it; instead, the brain learns how to minimize error signals in predicting relevant states of the world looking forward. Clark shows how this change of modeling perspective provides descriptive and explanatory leverage over a host of perceptual, cognitive, and motor phenomena, including disorders of the central nervous system and illusions of various sorts.
最后,Judea Pearl 的《机器学习的理论障碍》(第 18 章)讨论了最近在构建设备方面取得的进展,这些设备可以从相关数据以及更重要的关于干预对世界的影响的事实中学习其世界的因果结构。这样的模型允许代理了解其世界的显著特征,指导干预以产生预期结果,并诊断这些预测中的错误来源。Pearl 提出的那种模型现在通常用于理解复杂变量集中许多变量之间的因果关系,并已被用作人类发展的模型。在构建搜索和发现世界因果真相的机器时,人们为机器提供了构建世界因果结构内部表示的各种工具,这些知识可能被认为构成了我们对整个世界的大部分(或至少是相当大一部分)知识。
Finally, Judea Pearl’s “Theoretical Impediments to Machine Learning” (chapter 18) discusses recent advances in building devices that can learn the causal structure of their worlds from correlational data and, importantly, from facts about the consequences of interventions on the world. Such models allow agents to pick up on salient features of their world, guide interventions to produce desired outcomes, and diagnose the sources of error in those predictions. Models of the sort that Pearl advances are now routinely used to understand the causal relations among many variables in complex variable sets and have been used as models of human development. In building machines that search for and discover causal truths about the world, one gives machines the kinds of tools for building internal representations of the causal structure of the world, which knowledge might be thought to comprise the bulk (or at least a sizable portion) of our knowledge of the world as a whole.
一个问题(仍未得到解决)是,我们是否应该期望确定单一的建模策略,或者有效的心智设计是否需要不同领域的不同模型的组合。例如,强化学习和深度神经网络的结合是谷歌 AlphaGo 成功的关键。相反,预测编码的倡导者经常强调在感知和行动的每个阶段使用统一的建模策略所带来的理论和实践力量。正如我们所建议的,理解因果结构可能被认为是概念学习的先决条件(因为许多概念都涉及因果纠缠),预测接下来会发生什么,以及在复杂而混乱的世界中规划有效行动。
One question—still very much open—is whether we ought to expect to settle on a single modeling strategy or whether effective mind design requires a combination of different models in different domains. Combining reinforcement learning and deep neural networks, for example, was key to success for Google’s AlphaGo. Conversely, advocates of predictive coding often emphasize the theoretical and practical power that comes from using a unified modeling strategy for every stage of perception and action. And as we have suggested, understanding causal structure might be thought to be prerequisite for concept learning (as many concepts involve causal entanglements), for predicting what will happen next, and for planning effective actions in a complex and messy world.
本部分中的论文自然与第五部分中的论文相辅相成,第五部分中的论文重点关注认知科学的进步,这些进步使得其中一些建模策略或多或少变得合理。Rummelhart 和 Chuchland 和 Sejnowski 的论文(分别在第 19 章和第 20 章)为理解 Buckner 的论文提供了有用的背景。第六部分中的许多论文提供了有用的对比,因为它们的动机是建模对于智能行为来说可能不是必需的。除了本部分的论文之外,感兴趣的读者还可以通过多种方式扩展建模文献。
The papers in this part naturally go together with the papers in part V, which focus on advances in cognitive science that make some of these modeling strategies more or less plausible. Essays by Rummelhart and Chuchland and Sejnowski (chapters 19 and 20, respectively) provide useful background for understanding Buckner’s paper in particular. Many of the papers in part VI provide a useful counterpoint, as they are motivated by the possibility that modeling is not necessary for intelligent behavior. In addition to the papers in this part, the interested reader might expand upon modeling literature in several ways.
建模问题与本书前几部分提出的关于意向性和表征的问题密切相关。对建模的关注可能被视为关于意向性和表征的更广泛问题的自然延伸:关注某事物如何表征与表征什么同样重要。
The question of modeling is intimately bound up with questions about both intentionality and representation that arose in previous parts of the book. A focus on modeling might be seen as a natural extension of broader questions about intentionality and representation: a focus on how something is represented is as important as what is represented.
不同的建模策略是现代机器学习的核心。在撰写本书时,最前沿的技术可能在你拿到这本书时就过时了。不过,还有一些值得推荐的论文:
Different modeling strategies are at the heart of modern machine learning. What was cutting edge at the time of writing may well be old hat by the time this book is in your hands. Nevertheless, there are a few additional papers that are worth recommending:
深度学习。 无论以何种形式,深度学习模型很可能仍是现代机器学习的核心。感兴趣的读者可以考虑:
Deep Learning. Deep learning models, in some form or another, are likely to remain a core workhouse of modern machine learning. The interested reader might consider:
强化学习。 强化学习 (RL)(无论是单独使用还是与深度学习结合使用)在现实世界的商业系统中同样广泛应用。
Reinforcement Learning. Reinforcement learning (RL)—either alone or as a hybrid with deep learning—is similarly widespread in real-world commercial systems.
预测编码。 近年来,关于预测编码的哲学文献激增:
Predictive Coding. The philosophical literature on predictive coding has exploded in recent years:
因果模型。 对于普通读者来说,因果模型方面的文献可能比较难理解。
Causal Models. The literature on causal modeling can be a difficult nut to crack for the casual reader.
建模策略。 关于“简洁”与“粗糙”的思维建模方法的争论(这种区别被归因于 Schank、Abelson 或 Minsky 等不同人士)是一个永恒的争论。它并不局限于计算机科学:过去二十年关于语言进化的争论(例如)主要是关于语言是否应该被视为一个突然出现在舞台上的好技巧,还是它是由各种来源零碎组装而成的。在计算机科学中,这场争论可能被视为反映了催生该学科的两种思路——纯数学的优雅简单性与工程的无所顾忌的实用主义相结合。争论的支持者往往站在令人难以置信的极端,但这种二分法的核心是任何建模者都面临的简单性和强大性之间的权衡。
Modeling Strategies. The debate over “neat” versus “scruffy” approaches to modeling the mind (the distinction is variously attributed to Schank, Abelson, or Minsky) is a perpetual one. It is not limited to computer science: much of the argument in the past two decades over the evolution of language (for example) has been over whether language should be seen as one good trick, suddenly appearing on the stage, or whether it was assembled piecemeal from a variety of sources. Within computer science, the debate might be seen as reflecting the two strands that gave rise to the discipline—the elegant simplicity of pure mathematics combined with the unashamed pragmatism of engineering. Proponents in the debate often occupy implausible extremes, but the heart of the dichotomy is a trade-off between simplicity and power that is faced by any modeler.
卡梅伦·巴克纳
Cameron Buckner
2023
2023
深度学习允许由多个处理层组成的计算模型学习具有多个抽象级别的数据表示。这些方法极大地提高了语音识别、视觉对象识别等许多其他领域的最新水平。
Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition,…and many other domains.
—Yann LeCun、Yoshua Bengio 和 Geoffrey Hinton,《深度学习》(2015 年)
—Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep Learning” (2015)
“形成抽象”是 1955 年达特茅斯人工智能提案中列出的人工智能关键能力之一……然而,让机器形成类似人类的概念抽象仍然是一个几乎完全未解决的问题。
“Forming Abstractions” was one of the key AI abilities listed in the 1955 Dartmouth AI proposal…however, enabling machines to form humanlike conceptual abstractions is still an almost completely unsolved problem.
—Melanie Mitchell,《人工智能:思考人类的指南》(2019 年)
—Melanie Mitchell, Artificial Intelligence: A Guide for Thinking Humans (2019)
深度神经网络 (DNN) 能否有效模拟人类的抽象能力?1目前对这个问题的看法呈现出明显而令人费解的分歧。DNN 能够进行某种抽象——事实上,抽象是其显著优势——但这一点通常被 DNN 研究人员视为显而易见,几乎无需提及。DNN 能够成功识别图像中的物体、语音中的单词以及围棋或国际象棋游戏中的策略,通常被认为源于它们能够在输入和输出之间的节点层层次结构处理复杂数据时发现越来越抽象的模式。人们甚至希望这些网络能够模拟人类感知皮层在物体和语音识别等任务中发现和操纵抽象模式的方式。感知神经科学家长期以来一直推测皮层中存在“抽象层”,例如早期视觉专门检测特定和局部模式,如颜色、对比度和阴影;这是在中层视觉中检测线条和角度等更抽象的属性的基础,这使得能够在后期视觉中检测形状和图形等更抽象的属性,并最终在腹侧流的最新阶段检测完整构成的场景和情况(DiCarlo and Cox,2007;Goodale and Milner,1992;Hubel and Wiesel,1967;Riesenhuber and Poggio,1999)。这种抽象层次结构可以通过 DNN 进行建模的想法为第一个“深度”架构提供了灵感,并且最近得到了经验证据的支持,即相同类型的特征可以在灵长类感知皮层和图像分类 DNN 的层次结构的可比深度上恢复(Fukushima 和 Miyake,1982 年;Khaligh-Razavi 和 Kriegeskorte,2014 年;LeCun 等人,1990 年;Schmidhuber,2015 年;Yamins 和 DiCarlo,2016 年)。
Can Deep Neural Networks (DNNs) usefully model the human faculty of abstraction?1 Current opinion on this question exhibits a stark and puzzling divide. That DNNs are capable of some sort of abstraction—indeed, that abstraction is their distinguishing strength—is often treated by DNN researchers as so obvious as to barely require mention. DNNs’ success in recognizing objects in images, words in spoken speech, and strategies in a game of Go or chess are usually thought to derive from their ability to discover increasingly abstract patterns in complex data as it is processed by the hierarchy of node layers between input and ouput. There has even been hope that these networks model the way that human perceptual cortex discovers and manipulates abstract patterns in tasks like object and speech recognition. Neuroscientists of perception have long theorized about the existence of “abstraction layers” in cortex, for example that early vision specializes in the detection of specific and local patterns such as colors, contrasts, and shadings; which are the basis for the detection of more abstract properties, such as lines and angles, in middle vision, which allow detection of more abstract properties, such as shapes and figures, in late vision, and eventually of fully composed scenes and situations in the latest stages of the ventral stream (DiCarlo and Cox, 2007; Goodale and Milner, 1992; Hubel and Wiesel, 1967; Riesenhuber and Poggio, 1999). The idea that this hierarchy of abstraction can be modeled by a DNN provided inspiration for the first “deep” architectures and has more recently been bolstered by empirical evidence that the same kinds of features are recoverable at comparable depths of both primate perceptual cortex and an image-classifying DNN’s layer hierarchy (Fukushima and Miyake, 1982; Khaligh-Razavi and Kriegeskorte, 2014; LeCun et al., 1990; Schmidhuber, 2015; Yamins and DiCarlo, 2016).
与此同时,怀疑论者也同样认为,深度神经网络的一个致命弱点在于无法学习和操纵某些抽象概念。例如,梅兰妮·米切尔(Melanie Mitchell)在上面引文引用的论文中指出,过去 60 年,人工智能研究在使机器能够形成和操纵类似人类的概念抽象这一核心目标上几乎没有取得任何进展(Mitchell,2019 年)。加里·马库斯(Gary Marcus)也表达了类似的担忧,他认为“深度学习系统中所代表的知识主要涉及特征之间的(很大程度上不透明的)相关性,而不是量化陈述之类的抽象概念(例如,所有人都会死)”(Marcus,2018a)。为了解决与 DNN 支持者之间的明显矛盾,谷歌研究部的 François Chollet 认为,人类认知中存在两种不同的抽象形式,第一种存在于直觉和感知中,可以通过 DNN 建模,但第二种抽象形式推动了明确的演绎推理和更高级的认知,仍然是“当前 [DNN] 模型的根本弱点”(Chollet,2020 年)。这些批评者似乎都在某种程度上认同先天主义:获得完全像人类一样的抽象至少需要一些基本的先天专门表征。具体来说,这些批评者指出了涉及对象、主体、数字和原因等概念的“核心知识”系统研究,苏珊·凯里 (Susan Carey) 和伊丽莎白·斯佩尔克 (Elizabeth Spelke) 等发展心理学家认为这些概念是人类与生俱来的,有助于我们引导理论式的数学、社会关系和物理因果关系知识(Carey and Spelke,1996;Marcus,2018b)。
At the same time, skeptics take it as equally obvious that a fatal weakness of DNNs lies in their inability to learn and manipulate certain abstractions. For example, Melanie Mitchell argues in the paper quoted in the epigraph above, AI research in the last sixty years has made almost no progress on the core goal of enabling machines to form and manipulate human-like conceptual abstractions (Mitchell, 2019). Gary Marcus expresses similar concerns, holding that “knowledge represented in deep learning systems pertains mainly to (largely opaque) correlations between features, rather than to abstractions like quantified statements (e.g. all men are mortal)” (Marcus, 2018a). Coming closer to resolving the apparent tension with DNN proponents, François Chollet of Google Research argues that there are two distinct forms of abstraction in human cognition, the first of which is found in intuition and perception and can be modeled by DNNs, but the second of which drives explicit deductive reasoning and higher cognition and remains the “fundamental weakness of current [DNN] models” (Chollet, 2020). These critics all seem to agree on some degree of nativism: that acquiring fully human-like abstractions requires at least some basic stock of innate specialized representations. In particular, these critics point to research on “core knowledge” systems involving concepts such as OBJECT, AGENT, NUMBER, and CAUSE, which developmental psychologists such as Susan Carey and Elizabeth Spelke have argued are innate in humans and help us bootstrap a theory-like knowledge of mathematics, social relationships, and physical causality (Carey and Spelke, 1996; Marcus, 2018b).
因此,一方面,深度学习的支持者认为他们的模型之所以成功,是因为它们是良好的心理抽象模型;另一方面,怀疑论者则认为,它们的失败是因为它们不是。虽然这里可以依赖“感知”或“概念”等限定词,但这种对立至少造成了一种明显的谜团,因为怀疑论者和支持者都没有明确区分他们对“抽象”一词的使用和批评对象的使用。我们可以从这个结点的其他线索中寻找线索——例如“智力”、“理性”或“推理”——但这样做通常只会让我们回到其他线索上。智力心理测试旨在评估我们发现越来越微妙的抽象概念的能力,怀疑论者经常指出人工智能在这些测试中失败——例如瑞文矩阵、邦加德问题或新的面向抽象概念的心理测试电池,如 Chollet 的抽象和推理语料库 (ARC) (Chollet, 2019; Hernández-Orallo, 2017; Mitchell, 2019)——作为它们没有以类似人类的方式处理这些刺激的证据。
So on the one hand, proponents of deep learning argue that their models succeed because they are good models of mental abstraction, and, on the other hand, skeptics argue that they fail because they are not. Though there is room here to lean on qualifiers like “perceptual” or “conceptual,” this opposition sets up at least an apparent mystery, given that neither the skeptics nor proponents explicitly distinguish their use of the term “abstraction” from its use by their critical targets. We could pull at other threads in this knot—at “intelligence,” “rationality,” or “reasoning,” for example—but doing so usually just brings us right back to the others. Psychometric tests of intelligence are designed to assess our ability to detect ever-subtler abstractions, and skeptics often point to AI failures on such tests—like Raven’s matrices, Bongard problems, or new abstraction-oriented psychometric test batteries like Chollet’s Abstraction and Reasoning Corpus (ARC) (Chollet, 2019; Hernández-Orallo, 2017; Mitchell, 2019)—as evidence that they are not processing these stimuli in a human-like way.
此外,推理和抽象是同一枚硬币的两面。当决策依赖于抽象到无法用环境中可感知的特征来描述的关系时,人们通常认为该决策是基于推理的。考虑一下克里西普斯的狗的经典例子:当狗沿着一条通往三岔路的小径追逐猎物时,它会嗅到第一和第二条路,在前两个选项中没有发现猎物的气味,而是选择了第三条直路,而没有嗅到它。似乎无法像狗嗅到第一或第二条路那样轻易地解释这一决策;选择第三条未嗅到的路必须基于情况更抽象的逻辑形式。最抽象的关系通常被认为是那些难以或不可能用特定的感知线索来描述的关系,例如逻辑的同步范畴术语或数学的数字概念。
Further, reasoning and abstraction are two sides of the same coin. A decision is often said to be based on reasoning when it relies on relationships that are abstract enough to resist description in terms of occurrently perceivable features of the environment. Consider the classic example of Chryssipus’ dog: the dog is said to reason by exclusion when, chasing prey down a trail that leads to a three-way fork, it sniffs the first and second paths, and, failing to detect the prey’s scent in the first two options, takes the third straightaway, without sniffing it as well. The decision, it seems, cannot be explained as easily as if the dog had taken the first or second trail upon sniffing it; the choice of the third, unsniffed path must be based on the situation’s more abstract logical form. The most abstract relationships are often thought to be those that are difficult or impossible to characterize in terms of specific perceptual cues, such as the syncategorematic terms of logic or the numerical concepts of mathematics.
由于抽象受到的直接关注不如这个结中的其他线索那么多——关于人工智能中的智能或推理的著作已经很多(Hernández-Orallo,2017;Legg 和 Hutter,2007;Poole 等人,1998),因此我在这里重点关注抽象。这一重点为我们提供了一个定性和定量丰富的景观,可用于绘制和解释人工智能的最新进展。我们似乎都理解属性中一种直观的抽象连续体——局部、简单的属性,如颜色、对比度和阴影,通常被视为最简单的感知特征;线条和角度等更复杂的特征略显抽象;图形和物体更抽象;其次是场景和情况;正义和战争等抽象类型的情况;最后是最抽象的属性,即高阶数学和逻辑形式。这种直观的连续体也许经不起更深入的哲学审视,但我在此认为,DNN 的许多成功可以归因于通过澄清这个尺度的中间、较模糊的区域而在哲学和经验上取得的重大进展。
Since abstraction has received less direct attention than the other threads in this knot—with numerous works already written on intelligence or reasoning in AI (Hernández-Orallo, 2017; Legg and Hutter, 2007; Poole et al., 1998)—I focus on abstraction here. This focus presents us with a qualitatively and quantitatively rich landscape on which to map and explain recent progress in AI. We all seem to understand a kind of intuitive continuum of abstractness in properties—with local, simple properties, such as colors, contrasts, and shadings, often regarded as the simplest perceptual features; more composite features like lines and angles being slightly more abstract; figures and objects more abstract yet; followed by scenes and situations; abstract types of situations like justice and war; and finally the most abstract properties of all, higher-order mathematical and logical forms. This intuitive continuum may not withstand deeper philosophical scrutiny, but I argue here that many of DNNs’ successes can be attributed to having made philosophically and empirically significant progress by clarifying the middle, murkier zones of this scale.
为了阐明抽象的相关概念,我采用了一种将哲学史与最近的实证研究相结合的策略。通过借鉴洛克、休谟和伯克利等哲学家所阐述的经验主义抽象理论及其在认知中的作用,可以缓解人们对基于 DNN 的处理是陌生的或不透明的担忧。这些理论阐明并阐释了最近 DNN 架构的技术细节,并展示了这些进步如何融入经验主义抽象理论。但人工智能也可以为哲学提供信息:我认为 DNN 架构可以帮助解决长期存在的哲学难题和争论,这些难题和争论涉及经验主义抽象理论及其在获取一般类别表征中的作用。
To elucidate the relevant concept(s) of abstraction, I adopt a strategy that melds the history of philosophy with recent empirical work. Some of the fears that DNN-based processing is alien or opaque can be alleviated by drawing upon empiricist theories of abstraction and its role in cognition, elaborated by philosophers like Locke, Hume, and Berkeley. These accounts illuminate and contextualize otherwise technical details of recent DNN architectures and show how these advances fit into empiricist theories of abstraction. But AI can also inform philosophy: I argue that DNN architectures can help resolve long-standing philosophical puzzles and debates regarding empiricist accounts of abstraction and its role in the acquisition of general category representations.
本文剩余部分的结构如下。在第 15.2 节中,我探讨了抽象在经验主义心灵哲学中的作用。在第 15.3 节中,我回顾了最流行的 DNN 架构之一——深度卷积神经网络 (DCNN)——特别关注可能与抽象相关的组件。在第 15.4 节中,我将这些故事联系在一起,提出 DCNN 将几种不同的抽象形式整合成一个复杂的分工。最后在第 15.5 节中,我回到深度学习怀疑论者的担忧,并询问 DCNN 所模拟的抽象形式是否可以解决经验主义抽象理论的所有哲学难题,如果不能,还有哪些难题。
The structure of the remainder of the paper is as follows. In section 15.2, I canvass the role of abstraction in empiricist philosophy of mind. In section 15.3, I review one of the most popular kinds of DNN architecture—Deep Convolutional Neural Networks (DCNNs)—with particular focus on the components that may be relevant to abstraction. In section 15.4, I tie these stories together by suggesting that DCNNs bring together several different forms of abstraction into an intricate division of labor. Finally in section 15.5, I return to the concerns of the deep learning skeptics and ask whether the forms of abstraction that DCNNs do model can resolve all the philosophical puzzles for empiricist theories of abstraction and, if not, which puzzles remain.
在我看来,经验主义是一种认为所有知识都来自感官的学说。经验主义是一种关于人类概念形成的论点,它是一种关于思维表征起源的论点——它们和/或它们的结构是从感官体验或其组成部分中因果和表征地衍生出来的。我们称这种观点为“起源经验主义”。起源经验主义是著名的 DNN 研究人员在描述他们的成功时似乎想到的观点,例如当 AlphaZero 的开发人员声称它使用“白板算法”在“没有人类知识”的情况下学会下围棋(Silver 等人,2017 年)。
Empiricism, as I understand it, is the doctrine that all knowledge comes from the senses. Understood as a thesis about human concept formation, it is a thesis about the origins of the mind’s representations—that they and/or their structure is derived, causally and representationally, from sensory experiences or their components. Call this view “origin empiricism.” Origin empiricism is the view that prominent DNN researchers seem to have in mind when they describe their successes, such as when the developers of AlphaZero claim that it learns to play Go “without human knowledge” using a “tabula rasa algorithm” (Silver et al., 2017).
为了避免一开始就造成混淆,我们应该抛开一些解释起源经验主义的其他方法。例如,加里·马库斯 (Gary Marcus) 将经验主义的立场描述为要求架构的所有组成部分都以某种方式从经验中衍生出来——包括基本表征格式、心理能力和推理能力。然而,这一立场并没有得到任何严肃的经验主义哲学家的辩护,这是有充分理由的。如果没有注意力、记忆力、推理和抽象的基本能力,任何思想都无法从任何数量的感知经验中学习任何东西。此外,这些领域通用能力很可能在没有经验帮助的情况下发展,它们在认知中的作用由更普遍的神经连接原理所确保,这些原理可能是天生的,即受到高度的遗传控制 (Zador, 2019)。因此,将起源经验主义解释为允许领域通用能力或格式可能是天生的,但要求所有领域特定表征都来自经验,这更有成效。这种解释起源经验主义的方式仍然允许与引言中提到的本土主义者进行重大辩论。具体来说,对于需要构建到系统中以实现类似人类的学习和认知的领域特定表示的数量和重要性仍然存在分歧——最明显的是上面提到的核心知识概念。
To avoid confusion at the outset, we should set aside some alternative ways of construing origin empiricism. For example, Gary Marcus portrays the empiricist position as requiring that all components of an architecture be derived in some way from experience—including basic representational formats, mental faculties, and inferential capacities. This position, however, has not been defended by any serious empiricist philosopher, and for good reason. Without basic capacities for attention, memory, inference, and abstraction, no mind could learn anything from any amount of perceptual experience. These domain-general faculties, moreover, plausibly develop without the aid of experience, their roles in cognition secured by more general neural wiring principles that could be innate in the sense of being under a high degree of genetic control (Zador, 2019). Thus, it is more productive to construe origin empiricism as allowing that domain-general faculties or formats may be innate but requiring that all domain-specific representations be derived from experience. This way of construing origin empiricism still allows significant debate to be had with the nativists mentioned in the introduction. Specifically, disagreement remains regarding the number and prominence of domain-specific representations that need to be built into a system to enable human-like learning and cognition—most notably the core knowledge concepts mentioned above.
经验主义心智理论的基本推理机制是联想;感官印象通过联想链接相互联系并与行动联系在一起(Clatterbuck,2016)。从亚里士多德到奎因,所有经验主义心智理论都始于一套领域通用的联想学习和推理原则工具包,这些原则决定了哪些想法将与哪些想法相关联,从而在思维和决策的转变中相互联系(Buckner,2017)。细节随着时代的变迁而发生变化,但这个基本工具包包括相似性、频率、连续性和时间优先性,以及快乐和痛苦提供的激励动力。这些原则至今仍是心理学中联想学习理论的基础,在经典条件反射、工具性条件反射和配置学习模型中都有所体现(Pearce and Bouton,2001)。而这些理论反过来又启发了并受到作为深度学习前身的人工神经网络算法的启发(Gluck and Myers,2001;Pearce,2002;Squire,2004)。
The basic inferential mechanism in an empiricist theory of mind is association; sensory impressions are linked to one another and to actions via associative links (Clatterbuck, 2016). All empiricist theories of mind—from Aristotle to Quine—begin with a toolkit of domain-general principles of associative learning and inference that determine which ideas will become associated—and thereby linked in the transitions of thinking and decision-making—with which others (Buckner, 2017). The details have changed throughout the ages, but this basic toolkit includes similarity, frequency, contiguity, and temporal precedence, as well as the motivating impetuses provided by pleasure and pain. These principles remain the foundation of associative learning theory in psychology today—finding expression in models of classical conditioning, instrumental conditioning, and configural learning (Pearce and Bouton, 2001)—which in turn has inspired and been inspired by the artificial neural network algorithms that served as the precursors for deep learning (Gluck and Myers, 2001; Pearce, 2002; Squire, 2004).
经验主义的心灵哲学要想超越最基本的联想学习形式,就需要某种方式来超越简单的感官印象;因此,每一位杰出的经验主义者在关键时刻都会诉诸抽象能力。在最简单的情况下,从联想的基本原理中推导出抽象理论并不困难。如果红灯反复与铃声配对,那么大脑就会根据频率和连续性将这两种感知联系起来,红灯就会让人想起铃声。尽管频率、连续性和强化的基本原理仍然是成年人行为的强大驱动力,但人类认知中的许多其他方面——请原谅斯金纳的野心(斯金纳,1948 年)——都是根据更微妙的心理学原理运作的。我们还可以根据比红色或铃声更复杂的属性进行检测和决策,而根据这些属性做出的决策开始看起来更像是理性推理(Buckner,2019b)。在经验主义框架上解释这些更复杂关联的负担主要落在相似性原则上——尽管相似性必须比单纯从红色到橙色的刺激泛化更复杂。我们通常根据狗、椅子、三明治、咖啡、正方形、钱、假期和美德等类别进行推理。如果联想原则能让我们走得这么远,那么我们需要某种方式来理解这些类别的不同实例是如何被视为彼此相似的。如果感知到的相似性本身不是天生的——违反了经验主义对特定领域表征结构的禁令——那么就一定有某种能力让人们理解和学习相关的相似性。
For an empiricist philosophy of mind to get anywhere beyond the most basic forms of associative learning, it needs some way to rise above simple sensory impressions; and for this reason, every prominent empiricist appeals at key moments to a faculty of abstraction. In the simplest cases, there is no trouble deriving a theory of abstraction from the basic principles of association. If a red light is repeatedly paired with the sound of a bell, then the mind will come to associate the two perceptions by frequency and contiguity, and red lights will come to make one think of bells. Though the basic principles of frequency, contiguity, and reinforcement remain potent drivers of even adult human behavior, much else in human cognition—with apologies to Skinnerian ambition (Skinner, 1948)—operates according to more subtle psychological principles. We can also detect and decide on the basis of properties which are more complex than redness or bell-ringing, and the decisions made on the basis of such properties start to look more like rational inferences (Buckner, 2019b). The burden of explaining these more sophisticated associations on an empiricist framework falls mainly on the principle of similarity—though the similarities must be more sophisticated than mere stimulus generalization from red to orange. We commonly reason on the basis of categories like dog, chair, sandwich, coffee, square, money, vacation, and virtue. If the principles of association are ever to take us so far, then we need some way to understand how diverse instances of such categories come to be viewed as mutually similar to one another. And if the perceived similarities are not themselves innate—in violation of the empiricist prohibition on domain-specific representational structure—there must be some faculty that allows the relevant similarities to be apprehended and learned.
进入抽象能力。我们可以通过将其与特异性量联系起来的启发式方法来开始理解这种能力。考虑一下学习即使是中级抽象属性的难度,比如三角形这样的几何形状。我们可能会认为,三角形最不抽象的表示是对画在纸上的特定三角形的整体感知。这种感知在各个方面都是具体的:三角形具有特定的大小、颜色和照度;其角度有精确的度数,其边有特定的长度;并且它在视野中有一个特定的位置。如果我们要使用经验主义机制来学习三角形的更一般概念,那么它的共同属性必须以某种方式从一组这样的特定范例中推断出来,每个范例的这些参数值都有不同。
Enter the faculty of abstraction. We can get a start on understanding the faculty by appealing to the heuristic of relating it to amount of specificity. Consider the difficulty in learning even a mid-level abstract property, such as a geometric shape like triangle. The least abstract representation of a triangle, we might think, is a holistic perception of a particular triangle drawn on a piece of paper. This perception is specific in all respects: the triangle has a specific size, color, and degree of illumination; its angles have precise degrees, and its sides specific lengths; and it has a specific location in the visual field. If we are to learn a more general idea of a triangle using empiricist mechanisms, its common properties must somehow be extrapolated from a set of such specific exemplars, each with different values for these parameters.
约翰·洛克在《论语·第二篇》中一个经常被嘲笑的段落中,这样说明了获得三角形的一般概念所带来的复杂性:
John Locke illustrates the complications posed by acquiring a general concept of a triangle in this way in a frequently-mocked passage of the Essay2 :
显然,头脑中首先出现的是具体事物的观念,从这些具体事物开始,理解逐渐发展到一些一般的观念。……因为当我们仔细思考它们时,我们会发现,一般观念是头脑的虚构和设计,它们本身就很困难,不像我们想象的那样容易出现。例如,形成三角形的一般观念(这还不是最抽象、最全面和最困难的观念)是否需要一些努力和技巧,因为它既不是斜的,也不是矩形,既不是等边的,也不是等边的,也不是不等边的;而是同时是所有这些,又不是所有这些。实际上,它是不完美的东西,不可能存在;它是一个由几个不同且不一致的观念的某些部分组合在一起的观念。(洛克(1690),IV.7.9)
The ideas first in the mind, it is evident, are those of particular things, from whence, by slow degrees, the understanding proceeds to some few general ones.…For when we nicely reflect upon them, we shall find, that general ideas are fictions and contrivances of the mind, that carry difficulty with them, and do not so easily offer themselves, as we are apt to imagine. For example, does it not require some pains and skill to form the general idea of a triangle (which is yet none of the most abstract, comprehensive, and difficult), for it must be neither oblique, nor rectangle, neither equilateral, equicrural, nor scalenon; but all and none of these at once. In effect, it is something imperfect, that cannot exist; an idea wherein some parts of several different and inconsistent ideas are put together. (Locke (1690), IV.7.9)
在这里,洛克似乎自相矛盾;因为一个单一的观念怎么可能既是斜的又是直角的,或者既是等边的又是不等边的?然而洛克的观点是,三角形的一般观念必须以某种方式包含和归纳所有这些不一致的范例配置。洛克认为,从具体经验中获得这种一般观念需要活跃的思维,而《论语》的许多其他部分都提出了关于心理过程的假设,这些心理过程可以将我们从这些具体观念带到归纳它们的一般观念。
Here, it appears that Locke is contradicting himself; for how could a single idea be both oblique and right-angled, or both equilateral and scalene? And yet Locke’s point is that the general idea of a triangle must somehow include and subsume all of these inconsistent configurations of exemplars. Locke suggests that the acquisition of such a general idea from experience of particulars requires the contribution of an active mind, and many other parts of the Essay develop hypotheses about the mental processes that can carry us from such particular ideas to the general ideas that subsume them.
不幸的是,洛克似乎提供了几种不同的抽象机制,但并没有非常清楚地将它们相互联系起来。洛克讨论的第一种抽象机制是 Gauker (2011) 所说的“抽象即减法”。这种机制通过减去类别成员之间不同的所有特殊性形式来实现普遍性。如果一个人注意到一个三角形是不等边三角形,而另一个三角形是直角三角形,那么他就会从一般三角形表示中减去其角度的确切度数,而度数则不指定。如果一个人注意到一个三角形的边长相等,而另一个三角形是不等边三角形,那么他就会减去三角形边的确切长度,依此类推。通过这种机制的运作,思维必须“不创造任何新的东西,而只从复杂的理念中去掉……每个理念所特有的东西,保留所有理念所共有的东西”(III,iii,7)。正如高克所指出的,这样的理论本身也立即引发了一些困惑——具体来说,我们如何知道在减法之前应该把哪些细节组合在一起,以及当注意到差异时,我们如何知道应该减去多少?例如,如何知道应该将一个三边形排除在减法之外,而它的边并不在某一点相交?当人们了解到所有角度的度数可能都不同时,如何避免完全减去角度的存在?正如高克有些沮丧地说的那样,“大脑如何形成这样的一般概念(比如抽象的角度),这正是抽象减法理论应该回答的问题”(2011,27)。
Unfortunately, Locke seems to offer several distinct mechanisms of abstraction without very clearly relating them to one another. The first mechanism of abstraction discussed by Locke is what Gauker (2011) calls “abstraction-as-subtraction.” This mechanism achieves generality simply by subtracting out all of the forms of specificity that vary amongst members of a category. If one notices that one triangle is scalene and another is right-angled, then one subtracts the exact number of degrees of its angles from the general triangle representation, and leaves the number of degrees unspecified. If one notices that one triangle has sides of equal length and another is scalene, one subtracts out the exact length of the triangle’s sides, and so on. Through the operation of this mechanism, the mind must “make nothing new, but only leave out of the complex Idea…that which is peculiar to each, and retain that which is common to all” (III, iii, 7). As Gauker notes, such a theory immediately raises some puzzles of its own—specifically, how we know which particulars to group together before subtracting, and how we know how much to subtract when noticing discrepancies? For example, how does one know to exclude from the process a three-sided figure whose sides do not join at one point? And how does one avoid subtracting out the presence of angles completely upon learning that their degrees may all differ? As Gauker puts it with some frustration, “how the mind forms such general ideas [such as angle in the abstract] is precisely the question that the abstraction-as-subtraction theory is supposed to answer” (2011, 27).
Gauker (2011) 把洛克的第二种抽象机制称为“抽象即合成”。根据这一思路,抽象概念是由思维活动将不太抽象的概念结合在一起而形成的。例如,我们可以通过以正确的方式将“3”、“角度”和“边”等较简单的概念结合在一起来形成三角形的一般概念,而不是从杂乱的范例中提取这些共同点。许多抽象概念可以被认为是以这种方式组合而成的——洛克特别描述了铅的概念是通过结合“某种暗淡的白色简单概念,具有一定程度的重量、硬度、延展性和可熔性”而形成的(II,xii 6,在 Gauker(2011,20)中讨论)。这种机制的明显缺点是,它似乎假设学习者从一组较简单的抽象和组合原则开始,然后从中可以形成更复杂的抽象和组合原则。回想一下,起源经验主义者试图避免用无数的先天观念和特定领域的规则来填充头脑;但抽象作为组合似乎将许多这样的构建块视为理所当然。如果我们把完整的学习问题看作是奎因等较新经验主义者提出的问题,或者看作是当今 DNN 建模者面临的问题,那么它不是从预先简化的角度和线条输入开始,而是从原始、未经处理的“感觉受体刺激”开始,刺激方式是“特定频率的特定辐射模式”(奎因,1971,82)。因此,从个体角度来看,抽象作为合成也无法独自承担起源经验主义的负担。
A second Lockean mechanism of abstraction Gauker (2011) calls “abstraction-as-composition.” According to this approach, more abstract ideas are formed by acts of the mind that join together less abstract ones. For example, we might form the general idea of a triangle simply by joining together the simpler ideas of “3,” “angle,” and “side” together in the right way, rather than extracting such commonalities from messy exemplars. Many abstract ideas can be thought of as composed in this way—Locke in particular describes the idea of lead as being formed by joining “the simple Idea of a certain dull whitish colour, with certain degrees of Weight, Hardness, Ductility, and Fusibility” (II, xii 6, discussed in Gauker (2011, 20)). The obvious drawback to this mechanism is that it seems to presume that the learner begins with a stock of simpler abstractions and compositional principles from which the more complex ones can be formed. Recall that origin empiricists are trying to avoid populating the mind with innumerable innate ideas and domain-specific rules; but abstraction-as-composition seems to be taking a great many such building blocks for granted. If we consider the full learning problem as posed by a more recent empiricist like Quine—or as it confronts a DNN modeler today—it begins not with predigested input about angles and lines but rather with raw, unprocessed “stimulation of [the] sensory receptors” by “certain patterns of irradiation in assorted frequencies” (Quine, 1971, 82). Taken individually, abstraction-as-composition thus also cannot discharge the burdens of origin empiricism on its own.
这些困难和其他困难导致后来的经验主义者,如贝克莱和休谟,放弃了洛克的前两种抽象方法,转而寄希望于第三种方法,高克称之为“抽象即表征”。在这种方法中,样本被归类到一般类别集合中,并在分类和推理过程中选择各种特定样本来代表一般类别。贝克莱最清楚地阐述了这种机制。贝克莱通过强调麻烦的三角形段落的问题,简洁地驳斥了抽象即减法和抽象即合成,他认为“难道不难想象,一对孩子不能一起谈论他们的糖果和摇铃……直到他们首先把无数不一致之处拼凑在一起?”(贝克莱(1710),导言 13-14)。相反,贝克莱认为,理念总是具有特殊性,因此,最好通过将它们视为样本集来构建一般类别表征;这样,“一个就其本身而言是特殊的理念,通过代表或代表所有其他同类的特殊理念而变得具有普遍性”(贝克莱 (1710),12)。不幸的是,尽管抽象作为表征至少似乎避免了洛克三角段落中的明显矛盾,但它面临着其他两种机制所面临的相同问题。首先,为了正确地从代表性样本中概括,我们可能需要知道哪些特殊性是某个类别的最典型实例。其次,我们可能需要知道其他样本共享它们的哪些特殊属性,以确保这个特殊性的特性不会让我们误入歧途。休谟也推崇贝克莱的抽象即表征学说,他注意到了这些负担,并在履行这些负担时表现出恼怒,他写道,选择合适样本的能力“在最伟大的天才身上最为完美……但人类理解的最大努力也无法解释这一点”(休谟(1739),1.7)。本土主义者也许正确地责备了起源经验主义者,因为他们在被迫解释心灵如何从经验中得出一般观念时诉诸这种“魔法”。
These and other difficulties led later empiricists like Berkeley and Hume to abandon Locke’s first two methods of abstraction and rest their hopes instead upon a third, which Gauker calls “abstraction-as-representation.” On this approach, exemplars are grouped together in general category sets, and particular exemplars are variously chosen to stand in for the general class in processes of categorization and reasoning. This mechanism is most clearly articulated by Berkeley. Berkeley pithily dismissed abstraction-as-subtraction and abstraction-as-composition by highlighting problems with the troublesome triangle passage, opining “is it not a hard thing to imagine that a couple of children cannot prate together of their sugar-plums and rattles…’til they have first tacked together numberless inconsistencies?” (Berkeley (1710), Introduction 13-14). Instead, Berkeley thought that ideas are always of particulars, so it would be better to build general category representations by treating them as sets of exemplars; in that manner, “an idea which, considered in itself, is particular becomes general by being made to represent or stand in for all other particular ideas of the same sort” (Berkeley (1710), 12). Unfortunately, though it at least seems to avoid the apparent contradictions in Locke’s triangle passage, abstraction-as-representation faces versions of the same problems faced by the other two mechanisms. For one, to generalize correctly from a representative exemplar, we plausibly need to know which particulars are the most typical instances of a class. For another, we plausibly need to know which of their particular attributes are shared by the other exemplars to be sure that this particular’s idiosyncrasies do not lead us astray. Hume, who also recommends Berkeley’s doctrine of abstraction-as-representation, notes these burdens and evinces exasperation in discharging them, writing that the ability to select suitable exemplars is “most perfect in the greatest geniuses…but it can’t be explained by the utmost efforts of human understanding” (Hume (1739), 1.7). Nativists perhaps rightly chide origin empiricists here for appealing to such “magic” when pressed to explain how the mind derives general ideas from experience.
在考虑 DNN 能否填补起源经验主义者的这些空白之前,让我们先考虑第四种也是最后一种抽象,我称之为“抽象即不变性”。如果某个属性在空间或域的某些系统变换下保持不变,则该属性是不变的。在物理学中,对不变性的探索提供了一种揭示自然法则的方法;例如,角动量守恒定律可以从动量在旋转下不变的观察中推导出来(即物理定律不依赖于参考点的角度)。在数学中,拓扑学是一门分支学科,研究在系统空间变换(如旋转(或更一般地说,称为同胚的连续变形))下不变的属性。数学哲学中的逻辑学家试图正式定义逻辑抽象为在领域内所有排列组合下都能保留的抽象——例如,有效论证形式可能被认为是在能通过该形式的合范畴词连接的命题的所有排列组合下仍然保持真值的形式。
Before considering whether DNNs might fill in these lacunae for origin empiricists, let us consider a fourth and final kind of abstraction—which I have called “abstraction-as-invariance.” A property is invariant if it is unchanged under some systematic transformations of a space or domain. In physics, a search for invariance provides a method to reveal the laws of nature; for example, the law of conservation of angular momentum can be derived from the observation that momentum is invariant under rotation (i.e., the laws of physics do not depend upon the angle of a reference point). In mathematics, topology is a subdiscipline that studies properties that are unchanged under systematic spatial transformations, such as rotations (or more generally, continuous deformations called homeomorphisms). Logicists in the philosophy of mathematics have sought to formally define logical abstractions as those that are preserved under all permutations of the domain—for example, a valid argument form might be thought of as one that remains truth-preserving under all permutations of the propositions that could be joined by syncategorematic terms in that form.
将抽象与不变性联系起来的这些思想的一个有影响力的来源是休谟的《人本论》(以及后来康托尔、弗雷格和布鲁克斯的作品——参见 Shapiro,2004;Antonelli,2010),他在《人本论》中推测数学是最完美的科学,因为基数的概念可以从两个集合之间完全不变的一一对应(双射)中推导出来:
An influential source of these ideas linking abstraction with invariance is Hume’s Treatise (and also later the work of Cantor, Frege, and Boolos—see Shapiro, 2004; Antonelli, 2010), where he speculated that mathematics is the most perfect science because a notion of cardinal number can be derived from a perfectly invariant one-to-one correspondence (bijection) between two sets:
代数和算术是仅有的两门科学,在这些科学中,我们可以进行一系列复杂的推理,同时又能保持完美的准确性和确定性。我们拥有一套精确的标准,可以据此判断数字的相等性和比例性;根据它们是否符合该标准,我们可以确定它们之间的关系,而不会出现任何错误。当两个数字组合在一起时,其中一个数字的单位总是与另一个数字的每个单位相对应,我们称它们相等。(I. III. I.)
Algebra and arithmetic [are] the only sciences in which we can carry on a chain of reasoning to any degree of intricacy, and yet preserve a perfect exactness and certainty. We are possessed of a precise standard, by which we can judge the equality and proportion of numbers; and according as they respond or not to that standard, we determine their relations, without any possibility of error. When two numbers are so combined, as that the one has always a unit answering to every unit of the other, we pronounce them equal. (I. III. I.)
继弗雷格之后,这一提议被称为“休谟原则”。如今,类似的策略被用来定义抽象逻辑属性和抽象对象的概念(Fine,2002)。这种最终形式的抽象似乎与其他抽象不同——例如,在正常的认知发展过程中,人类是否真的通过任何这样的证明来获得数字概念,这一点尚不清楚。与前三种抽象相比,它可能更像是一种规范理想或辩护程序,因为它诉诸于一种变换极限——所有排列下的恒等式,或所有同胚下的不变性——人类认知只有在经过形式证明技术的训练后才能近似或证明这种极限。然而,正如我们将看到的,它是当前对话中其他抽象形式的一个有趣的补充,因为它提供了一种程序,可用于处理理性主义者认为是本土主义最有力证据的最抽象概念。
Following Frege, this proposal has come to be called “Hume’s Principle.” Similar maneuvers have been deployed to define the notion of an abstract logical property and abstract objects today (Fine, 2002). This final form of abstraction seems unlike the others—it is less clear, for example, that humans actually work through any such proof in acquiring the notion of a number in the course of normal cognitive development. It may instead be something more like a regulative ideal or justificatory procedure than the first three kinds of abstraction, for it appeals to a kind of limit of transformations—identity under all permutations, or invariance under all homeomorphisms—that human cognition could only approximate or demonstrate after training in formal proof techniques. As we will see, however, it is an interesting supplement to the other forms of abstraction in the present conversation because it provides a procedure that can be used to approach the most abstract concepts that the rationalists have considered the strongest evidence for nativism.
总而言之,经验主义者已经探索了(至少)四种性质不同的抽象形式:i)抽象作为减法,ii)抽象作为组合,iii)抽象作为表示,iv)抽象作为不变性。传统上,许多理论家认为这些不同的方法在最好的情况下是不相关的,而在最坏的情况下是理论竞争对手。因此,在最有影响力的经验主义者的内部辩论中,一种抽象形式的缺陷常常被当作其他抽象形式的论据。
To summarize, empiricists have explored (at least) four qualitatively distinct forms of abstraction: i) abstraction-as-subtraction, ii) abstraction-as-composition, iii) abstraction-as-representation, and iv) abstraction-as-invariance. Traditionally, many theorists treated these different approaches as unrelated in the best case and as theoretical competitors in the worst. Flaws noted with one form of abstraction were often thus taken as arguments for the others in internal debates amongst the most influential empiricists.
在下一节中,我将通过回顾深度卷积神经网络的关键组成部分来质疑这种对立框架,认为它们提供了一种理解抽象的新方法,从而消除了之前的大部分争论。以前我们可能将“抽象”视为一种单一的心理操作,这种操作应该根据必要和充分条件来定义——从而给起源经验主义者施加压力,让他们说出这四种抽象在概念上有什么共同之处,或者拒绝某些形式而支持其他形式——我们可以解释一种机制,它同时执行归因于所有四种抽象形式的不同操作。与我们刚刚讨论的选项相比,这种基于模型的抽象方法更符合认知科学中最近基于模型的解释方法(Boyd,1999;Craver,2007;Godfrey-Smith,2006;Weiskopf,2011),并且可以说解决了前几个世纪困扰经验主义哲学家的一些问题。
In the next section, I question this oppositional framework by reviewing the key components of deep convolutional neural networks, arguing that they provide a novel way to understand abstraction that obviates much of the previous debate. Whereas previously we might have approached “abstraction” as naming a unitary kind of mental operation which should be defined in terms of necessary and sufficient conditions—thus putting pressure on origin empiricists to say what, conceptually, the four kinds of abstraction have in common, or rejecting some forms in favor of others—we can instead explicate a kind of mechanism that performs diverse operations attributed to all four forms of abstraction simultaneously. This model-based approach to abstraction is more in-line with recent model-based approaches to explanation in cognitive science (Boyd, 1999; Craver, 2007; Godfrey-Smith, 2006; Weiskopf, 2011) than the options we have just canvassed and arguably dissolves some of the previous problems that have vexed the empiricist philosophers of previous centuries.
直到最近,一种称为“深度卷积神经网络”(以下简称 DCNN)的 DNN 架构一直被认为是解决最广泛问题的最可靠成功工具。3这种架构是图像识别(AlexNet)、策略游戏(AlphaGo)、科学数据分析和许多其他应用模型的关键组件。这些 DCNN 都继承了福岛在 1970 年代后期开发的一种较旧的神经网络原型的基本特征,称为“Neocognitron”(福岛,1979 年),该原型专门用于模拟被认为发生在灵长类大脑皮层中的某些感知抽象方面(有关历史,请参阅 Buckner,2018 年、2019b 年;Schmidhuber,2015 年)。 Neocognitron 可能是第一个真正“深”的网络(有 4-10 层,具体取决于如何计算),但其最强大的创新是它将两种不同类型的处理节点(线性卷积滤波器和非线性下采样器)组合在一个网络中的方式。这些不同类型的节点堆叠在一个深层层次中,我将其称为其特有的“抽象三明治”图案。这些三明治结合了不同操作的优势,允许单一机制同时执行上一节中提到的所有四种抽象形式。
Until recently, a type of DNN architecture called a “Deep Convolutional Neural Network” (hereafter DCNN) has been considered the most reliably successful tool on the widest range of problems.3 This architecture is a key component in models of image recognition (AlexNet), strategy gameplay (AlphaGo), scientific data analysis, and many other applications. These DCNNs all inherit their basic features from an older neural network prototype developed by Fukushima in the late 1970s called “Neocognitron” (Fukushima, 1979), which was specifically designed to model certain aspects of perceptual abstraction thought to occur in primate neocortex (for the history, see Buckner, 2018, 2019b; Schmidhuber, 2015). Neocognitron was perhaps the first network that was truly “deep” (with 4–10 layers, depending on how they are counted), but its most powerful innovation was the way it combined two different types of processing nodes—linear convolutional filters and nonlinear downsamplers—in a single network. These different kinds of node are stacked in a deep hierarchy in what I will call its characteristic “abstraction sandwich” motif. These sandwiches combine the strengths of diverse operations, allowing a single mechanism to simultaneously perform all four forms of abstraction alluded to in the previous section.
两种不同类型的处理神经元之间的区别受到 Hubel 和 Wiesel (1967) 在视觉神经解剖学方面具有影响力的研究的启发。使用单细胞通过对猫早期视觉区域的记录,他们根据不同的放电模式确定了两种不同的细胞类型,即“简单”细胞和“复杂”细胞。简单细胞检测特定方向和位置的边缘或对比度等低级特征,而复杂细胞则从许多简单细胞获取输入,并响应相同但位置不变性程度更高的特征而放电。当时的神经科学家推测,这些简单和复杂细胞的许多层可能在皮质视觉处理流中迭代,它们的相互作用可能解释了我们自己能够识别不同位置和姿势中越来越抽象的特征的能力。Neocognitron 通过包含执行卷积(一种线性代数运算,详述如下)以检测特定位置和特定姿势的特征的“简单”节点和平均来自许多空间附近简单节点的输出的“复杂”节点来模拟这种行为,聚合它们的活动以在其位置或姿势的微小变化中检测这些特征。一旦第一个三明治从原始输入中提取了一些抽象特征,就可以在其上堆叠另一个三明治,将其计算资源集中在从最后一层的输出中提取更多抽象特征上。原则上,这个过程可以无限重复,许多抽象三明治分层堆叠,这样处理就会逐渐在越来越广泛的视觉空间变化范围内检测到越来越多的抽象特征。凭借这些创新,Neocognitron 能够通过对哺乳动物新皮质处理流的分层处理级联进行建模,在以高方差为特征的困难任务(如手写数字或面部识别)上胜过当时的其他神经网络。
The distinction between two different types of processing neurons was inspired by influential work on the neuroanatomy of vision by Hubel and Wiesel (1967). Using single-cell recordings in early visual areas of cats, they identified two different cell types, “simple” and “complex” cells, based on their differential firing patterns. Whereas simple cells detect a low-level feature like an edge or contrast in a particular orientation and position, complex cells take input from many simple cells and fire in response to the same features but with a greater degree of positional invariance. Neuroscientists at the time speculated that many layers of these simple and complex cells might be iterated in the cortical visual processing stream, and their interplay might explain our own ability to recognize increasingly abstract features in diverse locations and poses. Neocognitron modeled this behavior by containing “simple” nodes that performed convolution (a type of linear algebra operation elaborated below) to detect features at particular locations and in particular poses, and “complex” nodes that averaged output from many spatially nearby simple nodes, aggregating their activity to detect those features across small shifts in its location or pose. Once the first sandwich has extracted somewhat abstract features from the raw input, another sandwich can be stacked atop it, focusing its computational resources on extracting even more abstract features from the last layer’s output. In principle, this process could be repeated indefinitely, with many abstraction sandwiches stacked hierarchically, such that processing gradually detects more and more abstract features across an ever broader range of visuospatial variance. With these innovations, Neocognitron was able to outperform other neural networks of the day on difficult tasks characterized by high variance—such as handwritten digit or facial recognition—by modeling the hierarchical processing cascade of mammalian neocortical processing streams.
这些操作中的每一个都值得详细阐述。让我们从卷积开始。感知输入通常以网格状结构传递到这样的网络,视觉网格中最小的信息单位通常是一个像素,通常是在该位置检测到的红色、绿色和蓝色通道强度的多维向量。卷积是矩阵上的线性代数运算;该操作将空间像素块(通常是矩形)的矢量值转换为最大化某些值并最小化其他值的方式。卷积节点学习的模式称为过滤器或内核,因为它们在训练后会放大其输出中特定类型特征的存在;例如,有用的垂直边缘内核会最大化与垂直边缘相对应的值,同时最小化其他所有值。在三明治的下一层中,每个卷积操作都会将其输出传递给整流线性单元 (ReLU),如果卷积的输出超过某个阈值,该单元就会激活。换句话说,只有当特征被认为在该位置被检测到时,卷积的输出才会被传递到处理层次结构中。将垂直边缘核传递到整个图像的最终结果将是一种仅显示所有垂直边缘的表示,从而创建一个称为“特征图”的新中间表示。垂直边缘特征图可以通过想象在整个图像上滑动垂直边缘的模板来可视化,并在单独的、空间组织的位置图上记录底层图像干净地填充模板的程度。
Each of these operations bears elaboration. Let us begin with convolution. Perceptual input is typically passed to such a network in a gridlike structure, and the smallest unit of information in a visual grid is often a pixel, which is typically a multi-dimensional vector of Red, Green, and Blue color channel intensity detected at that location. Convolution is a linear algebra operation over matrices; the operation transforms the vector values for a spatial chunk of pixels (usually a rectangle) in a way that maximizes some values and minimizes others. The patterns learned by convolutional nodes are called filters or kernels, because after training they amplify the presence of a particular kind of feature in its output; for example, a useful vertical-edge kernel maximizes values corresponding to a vertical edge while minimizing everything else. In the next layer of a sandwich, each convolution operation passes its output to a rectified linear unit (ReLU), which activates if the output of convolution exceeds a certain threshold. In other words, the output of convolution is only passed up the processing hierarchy if the feature is deemed to have been detected at that location. The net result of passing a vertical-edge kernel across the whole image would be a representation that shows all and only the vertical edges, creating a new intermediary representation called a “feature map.” A vertical-edge feature map might be visualized by imagining sliding a stencil for a vertical edge across a whole image, and recording the degree to which the underlying image cleanly fills out the stencil on a separate, spatially organized map of locations.
然而,通常情况下,识别一般类别通常需要的不仅仅是识别垂直边缘等简单特征;我们可能还需要检测更多种类的边缘,包括不同位置、大小和角度方向的边缘(以便在后期阶段,各种边缘可能会组合成有用的复合体,如形状或数字)。福岛的“复杂”单元的加入完成了抽象三明治,它从下面的几个邻近卷积节点获取输入,并使用下采样操作聚合它们的输出。一般来说,下采样是一种选择输入数据子集来构建压缩表示的操作,该压缩表示仅保留原始数据中被认为特别相关的信息。.jpg 或 .gif 文件格式的文件大小缩减和 .mp3 中的声音压缩都涉及下采样。使用下采样,我们现在可以有效地表达这样一个事实:边缘大约出现在某个空间方向上,而不管它出现在哪里或如何定向。将输入与各种边缘检测内核全局相乘并使用下采样将其输出组合起来的净效果就像在数码照片编辑程序中应用边缘检测滤波器一样;结果是一个简化的图像表示,无论边缘位于何处以及它们的方向如何,都可以显示所有边缘,并“减去”其他信息。在最先进的网络中,下采样通常通过称为“最大池化”的操作执行。最大池化涉及简单地传递从空间上邻近位置的过滤器节点获取的输入中最大的激活(高于临界阈值)。在下一节中,我将解释卷积-ReLU-池化三明治层次结构如何展示大脑如何使用同一种基本机制执行第 15.2 节中探讨的所有四种抽象。
Typically, however, the recognition of a general category often requires more than merely recognizing simple features like vertical edges; we might also need to detect edges in a wider diversity of presentations, in different locations, sizes, and angular orientations (so that, at later stages, various edges might be assembled into useful composites like shapes or digits). The addition of Fukushima’s “complex” units completes an abstraction sandwich by taking input from several nearby convolutional nodes below and using a downsampling operation to aggregate their outputs. Generally speaking, downsampling is a kind of operation that selects a subset of input data to build a compressed representation that preserves only information from the original that is deemed especially relevant. File size reduction in the.jpg or.gif file formats and sound compression in.mp3s both involve downsampling. Using downsampling, we can now efficiently express the fact that an edge occurred approximately here in some spatial orientation, irrespective of where it appeared or how it was oriented. The net effect of globally multiplying the input by a variety of edge-detecting kernels and combining their outputs using downsampling is like applying an edge-detector filter in a digital photograph editing program; the result is a simplified image representation that reveals all the edges wherever they are located and however they are oriented, and “subtracts out” other information. In state-of-the-art networks, downsampling is usually performed by an operation called “max-pooling.” Max-pooling involves simply passing along the greatest activation (above a critical threshold) amongst the inputs taken from filter nodes at spatially nearby locations. In the next section, I explain how a hierarchy of convolution-ReLU-pooling sandwiches shows how the brain can perform all four kinds of abstraction explored in section 15.2 using the same kind of fundamental mechanism.
尽管 DCNN 取得了成功并广受欢迎,但关于其为何如此有效的理论解释仍然难以达成共识。在进行了几项其他分析(DiCarlo 等人,2012 年;Montufar 等人,2014 年;Patel 等人,2016 年;Schmidhuber,2015 年)之后,我根据上述三个特征描述了这些网络的计算核心:(1)多层分层处理,可插入两种不同类型的计算节点,(2)线性卷积滤波器,以及(3)非线性“池化器”。4尽管关于这些操作中的哪一种对于良好性能至关重要仍存在很大争议,但第 15.2 节中的经验主义思想有助于说明为什么这种特殊的特征组合如此有效。反过来,结合这些特征的人工网络可以解释为什么它们建模的神经结构和过程在解决大脑面临的计算问题方面如此有效。
Despite their success and widespread popularity, a consensus theoretical explanation as to why DCNNs work so well has remained elusive. Following several other analyses (DiCarlo et al., 2012; Montufar et al., 2014; Patel et al., 2016; Schmidhuber, 2015), I characterize the computational core of these networks in terms of the three features just described: (1) many layers of hierarchical processing which interpolate two different kinds of computational nodes, (2) linear convolutional filters, and (3) non-linear “poolers.”4 Though there remains significant debate as to which of these operations is essential for good performance, the empiricist ideas in section 15.2 help show why this particular combination of features is so effective. In turn, artificial networks incorporating these features may explain why the neural structures and processes they model are so effective in solving the computational problems faced by the brain.
回到困扰洛克的获取中级抽象的一般概念的问题——三角形、三明治或椅子等类别。也许一些几何类别,如三角形,可以用完美的不变性来定义,但像 Rosch (1978) 和 Barsalou (1999) 这样的心理学家已经令人信服地论证说,人类认知中部署的大多数中级类别的成员资格等级太高且太特殊,无法如此清晰地定义。相反,正如维特根斯坦所强调的那样,它们是缺乏完美不变本质的“家族相似性”类别;这是基于规则构建人工智能的尝试未能掌握它们的主要原因之一(Brooks,1991;Hofstadter,1985)。然而,如果我们将不变性视为一个更等级化和多维的概念,那么控制感知输入系统变化的领域通用能力可能会将第 15.2 节中讨论的四种抽象形式统一起来。
Return to the problem of acquiring general ideas of mid-level abstraction which plagued Locke—categories like triangle, sandwich, or chair. Perhaps some geometric categories, like triangle, can be defined with perfect invariance, but psychologists like Rosch (1978) and Barsalou (1999) have argued convincingly that membership in most mid-level categories deployed in human cognition is too graded and idiosyncratic to be defined so cleanly. They are instead, as Wittgenstein emphasized, “family-resemblance” categories that lack a perfectly invariant essence; this is one of the primary reasons that rule-based attempts at building an artificial intelligence failed to master them (Brooks, 1991; Hofstadter, 1985). However, if we treat invariance as a more graded and multi-dimensional notion, a domain-general ability to control for systematic variation in perceptual input might unite the four forms of abstraction canvassed in section 15.2.
具体而言,计算机视觉和机器学习研究人员一直在努力解决困扰洛克的相同问题:三角形、椅子、猫和其他日常类别很难识别,因为它们的实例可以以各种不同的姿势或方向出现,而这些姿势或方向的低级感知属性也不同。从正面看到的椅子与从后面或上面看到的椅子看起来不太一样;所以我们必须以某种方式统一所有这些不同的视角,以构建一个可靠的椅子检测器。视角可以变化的变量集对于非常广泛的常见类别而言往往是相同的;计算机视觉研究人员将这些惯犯称为“干扰变量”,因为它们对现实世界中的类别识别和决策提出了系统性挑战。干扰参数的常见例子是视觉识别任务中的大小、位置和角度旋转,或听觉识别任务中的音高、音调和持续时间。因此,计算机视觉建模者面临的挑战是开发一种人工智能体,它可以通过控制常见形式的干扰变化来重塑其感知相似性。能够做到这一点的代理应该能够判断从正面看到的猫与从背面看到的猫更相似,而不是从正面看到的狗,尽管在原始感知输入统计数据中得出的结论恰恰相反。这种抽象的相似性在感知识别的情况下可能最为明显,但这种现象也延伸到非模态领域,例如国际象棋或围棋棋盘图案——因为棋盘位置中的抽象图案不仅可以在游戏棋盘的照片中识别出来,而且(经过适当的训练)还可以在符号符号中识别出来。
Specifically, computer vision and machine learning researchers have struggled against the same concerns that troubled Locke: that triangle, chair, cat, and other everyday categories are difficult to recognize because their instances can be encountered in a variety of different poses or orientations that differ in their low-level perceptual properties. A chair seen from the front does not look much like the same chair seen from behind or above; so we must somehow unify all these diverse perspectives to build a reliable chair-detector. The set of variables on which perspectives can vary tends to be the same for a very wide range of common categories; computer vision researchers have come to call these repeat offenders “nuisance variables,” as they present systematic challenges to category recognition and decision-making in the real world. Common examples of nuisance parameters are size, position, and angular rotation in visual recognition tasks, or pitch, tone, and duration in auditory recognition tasks. The challenge facing a computer vision modeler is thus to develop an artificial agent that can reshape its sense of perceptual similarity by controlling for common forms of nuisance variation. An agent that is able to do so should be able to judge a cat seen from the front as more similar to a cat seen from behind than to a dog seen from the front, despite the opposite conclusion being apparent in raw perceptual input statistics. Such abstract similarities may be clearest in cases of perceptual recognition, but the phenomenon also extends to amodal domains, such as chess or Go board patterns—as abstract patterns in board positions might be recognized not only in photographs of a game board, but also (given proper training) in symbolic notation.
那么,当我们只接触过具有相互不一致的表面属性(大小、空间位置、角度旋转和度数)的特定和特殊样本时,我们如何形成像三角形这样的一般概念呢?在处理像三角形这样的几何图形时,可以通过学习一组适当的几何变换来克服这种多样性。正确的一系列仿射变换(收缩、膨胀、扩张、旋转或剪切)可以将任何任意三角形转换为任何其他任意三角形。然而,从认知上讲,这有点过头了,因为我们并不要求同一类别的样本在感知上彼此无法区分。我们只是更温和地要求它们彼此之间的相似性要高于它们最初看起来更相似的对立类别的成员(例如,具有相同长度和角度旋转的正方形或菱形)。
So, how do we form a general idea like triangle when we have only been exposed to particular and idiosyncratic exemplars with mutually inconsistent surface properties—sizes, spatial positions, angular rotations, and degrees? When dealing with geometric figures like triangles, this diversity could be overcome by learning an appropriate set of geometric transformations. The right series of affine transformations—contractions, expansions, dilations, rotations, or shears—could transform any arbitrary triangle into any other arbitrary triangle. Cognitively, however, this would be overkill, for we do not require that the exemplars of a common category be rendered perceptually indistinguishable from one another. We only more modestly require that they be rendered more mutually similar to one another than they are to members of opposing categories with which they might initially appear more similar—such as a square or rhombus with lines of the same length and angular rotation.
为了清楚地表达解决方案,引入感知相似空间的技术概念和作为其中区域的类别表示将会很有用(有关这些概念的发展和批判性审查,请参阅 Churchland.,1989;Gärdenfors,2004;Gauker,2011)。感知相似空间是一个多维向量空间 - 每个维度代表一个感知可辨别的特征 - 它将代理对每个样例的感知体验绘制成一个唯一的向量。该空间中的向量距离标记了不同样例之间感知相似度。“流形”是该向量空间的一个区域,可用于标记类别表示的边界。这样设想,感知分类面临的问题是,正如 DiCarlo 等人所说。 (2012) 表示,干扰变化导致“对应于不同 [一般类别] 的流形‘缠结’在一起,就像揉成一团的纸片一样”(第 417 页)。大脑和人工智能体的任务都是找到一系列操作(对这个空间进行系统性变换),以可靠地“展开”对应于不同类别的区域,从而更容易区分它们。更具体地说,人工智能体必须学习对该空间进行一系列变换,将不同的三角形映射到变换后的三角形流形中的附近点,同时确保该流形标记的区域与对应于相反类别的流形线性可分,例如“正方形”或“菱形”,由于它们具有相似的大小和方向,因此其样本在矢量距离方面可能最初看起来与某些三角形样本非常相似(DiCarlo and Cox,2007)。
To articulate a solution, it will be useful to introduce technical notions of a perceptual similarity space and category representations as regions therein (for development and critical examination of these notions, see Churchland., 1989; Gärdenfors, 2004; Gauker, 2011). Perceptual similarity space is a multi-dimensional vector space—with each dimension standing for a perceptually discriminable feature—that plots an agent’s perceptual experience of each exemplar to a unique vector. Vector distance in this space marks the degree of perceived similarity between the different exemplars. A “manifold” is a region of this vector space, which can be taken to mark the boundaries of a category representation. Conceived in this way, the problem facing perceptual categorization is that, as DiCarlo et al. (2012) put it, nuisance variation causes “the manifolds corresponding to different [general categories to] be ‘tangled’ together, like pieces of paper crumpled into a ball” (p. 417). The task of both the brain and artificial agents is to find a series of operations—systematic transformations of this space—that reliably “unfold” the regions corresponding to different categories so that they can more easily be discriminated. More specifically, agents must learn a series of transformations of this space that map disparate triangles to nearby points in a transformed triangle manifold while ensuring that this manifold marks a region that is linearly separable from the manifolds corresponding to opposing categories, like “square” or “rhombus,” whose exemplars might initially appear highly similar in terms of vector distance to certain triangle exemplars because they are of similar sizes and orientations (DiCarlo and Cox, 2007).
我们现在有了更坚实的概念基础来理解洛克令人困扰的三角形段落。与贝克莱和休谟相反,洛克不必被解释为暗示三角形的一般范畴表示是一种具有不一致属性的内省心理图像。相反,三角形的一般概念可能是一种更次个人的东西,就像一个变换的范畴流形,如果它可以被连贯地想象出来,看起来会更像一幅毕加索的画作,其中许多不同的三角形姿势并列在一个空间上不可能的大杂烩中。这就是三角形的抽象表示可能涉及所有这些变化,也可能不涉及任何变化的意义;它通过将特殊的样本转换为抽象的表示格式来控制它们,这种抽象的表示格式可以调整干扰变化,将共同类别的样本定位为变换流形中的邻近点。然而,这个一般流形本身包含一个相似空间的整个区域,不应将其解释为描绘具有某些特定干扰参数配置的样本的单一连贯视图。从这个角度来看,洛克的评论可能被宽容地视为努力表达一种超出他那个时代的哲学和数学词汇范围的抽象理论。
We now have a sturdier conceptual foundation to make sense of Locke’s troublesome triangle passage. Against Berkeley and Hume, Locke need not be interpreted as suggesting that the general category representation of a triangle is an introspectible mental image with inconsistent properties. Rather, the general idea of a triangle might be something more subpersonal, like a transformed category manifold that, if it could be coherently imaged at all, would look more like a painting by Picasso, with many different triangle poses juxtaposed in a spatially impossible mish-mash. This is the sense in which an abstract representation of triangle might involve both all and none of those variations; it controls for them by transforming idiosyncratic exemplars into an abstract representational format that adjusts for nuisance variations, locating exemplars of a common category as nearby points in a transformed manifold. This general manifold itself, however, consists in a whole region of similarity space that should not be interpreted as depicting a single coherent view of an exemplar with some particular configuration of nuisance parameters. In this light, Locke’s comments might charitably be seen as struggling to express a theory of abstraction beyond the reach of his day’s philosophical and mathematical lexicon.
这些转换是通过执行第 15.2 节中描述的不同形式的抽象来实现的,使用第 15.3 节中描述的节点三明治中的各种操作。简而言之,卷积滤波器可以理解为执行抽象作为组合的操作,而最大池化下采样器可以被认为是执行抽象作为减法的操作。5同一层的不同内核可以探索从三明治的不太抽象的输入中组成特征的多种不一致方式,然后可以将这些多种选项传递给下采样器,下采样器会聚合并减去池化特征图中的这些变化形式,并将其传递给层次结构中的下一层三明治。随着这种抽象转换过程在深层层次结构中迭代,转换后的信号甚至可能接近抽象作为不变性的理想,因为后面的层将探索越来越多的系统转换,并几乎完全控制干扰参数的影响。
These transformations are achieved by performing the different forms of abstraction described in section 15.2, using the diverse operations in the node-sandwiches described in section 15.3. In short, the convolutional filters can be understood as performing the operation of abstraction-as-composition, and the max-pooling downsamplers can be thought of as performing the operation of abstraction-as-subtraction.5 Multiple inconsistent ways of composing a feature from a sandwich’s less-abstract inputs can be explored by different kernels at the same layer, and those multiple options can then be passed to a downsampler, which aggregates and subtracts out these forms of variation in the pooled feature map that it passes along to the next layer of sandwiches in the hierarchy. As this transformational process of abstraction is iterated throughout a deep hierarchy, the transformed signals may even approach ideals of abstraction-as-invariance, as later layers will have explored more and more systematic transformations and controlled for the influence of nuisance parameters almost completely.
更好的是——也许可以对抽象作为减法的简单解释进行扭曲,即丢弃有关干扰参数的信息——通过转换抽象从一种表示中减去的属性可能会在另一种表示中被赋予专用资源。换句话说,转换层次结构可能会表示多个独立通道上的单个样本,提取沿单个干扰参数维度展开的表示,例如样本在视野中的位置(正如长期以来在腹侧和背侧流处理之间的更大规模表示分工中所假设的那样——参见 Goodale 和 Milner,1992 年)。结果是一种处理形式,它始于原始纠缠信号的嗡嗡声、混乱,结束于一系列转换的、高度抽象的专用通道,这些通道可以在输出层混合和匹配,以灵活地为各种分类和决策问题提供答案。
Even better—and providing perhaps a twist on the simple construal of abstraction-as-subtraction as discarding information about nuisance parameters—properties subtracted from one representation by transformational abstraction might be given a dedicated resources in another. In other words, a transformational hierarchy might come to represent a single exemplar along multiple independent channels, with extracting a representation unfolded along the dimension of an individual nuisance parameter like an exemplar’s location in the visual field (as has long been hypothesized in the larger-scale representational division-of-labor between ventral and dorsal stream processing—see Goodale and Milner, 1992). The result is a form of processing that begins with a buzzing, blooming confusion of raw entangled signals and ends with a series of transformed, highly abstracted, dedicated channels that can be mixed and matched at the output layer to flexibly craft answers to a wide variety of categorization and decision-making problems.
到目前为止,我仅解释了同一网络架构如何执行实现抽象作为组合、抽象作为减法以及至少在某种程度上实现抽象作为不变性的操作。那么抽象作为表示呢?在这种情况下,单个样本用于代表整个类别。有趣的是,机器学习研究人员发现,有关可能的干扰配置的信息仍然隐藏在这些后期转换的流形中。我们之所以知道这一点,是因为从具有特定干扰参数组合的样本中提取抽象类别表示所需的转换可以大致反向执行,从而从解开的类别流形上的任意位置渲染高度合理的样本(Goodfellow,2016)。这里不详细阐述技术细节,这就是“深度伪造”和其他深度学习创意产品背后的技术,例如不存在的人的逼真照片或以著名艺术家的风格灵活呈现的照片。回到 DiCarlo 等人的折纸比喻,如果一个已经学会“展开”不同类别流形的系统能够以类似的方式“重新折叠”它们,那么抽象类别流形中的每个向量都可以重新映射到其原始感知表示,并使用其原始干扰参数(如姿势、位置、比例等)的适当指定值。此外,如果我们在解开的流形上选择一个新位置,然后使用此逆过程重新折叠流形,我们可以生成具有网络从未在其训练中观察到的干扰参数新组合的照片般逼真的样本(Gatys 等人,2016 年)。
Thus far, I have explained only how the same network architecture might perform operations implementing abstraction-as-composition, abstraction-as-subtraction, and at least to some degree, abstraction-as-invariance. What of abstraction-as-representation, where a single exemplar is used to stand in for a whole category? Fascinatingly, machine learning researchers have discovered that information about possible nuisance configurations remains latent in these late-stage, transformed manifolds. We know this because the transformations required to extract abstract category representations from exemplars with specific combinations of nuisance parameters can be performed roughly in reverse to render highly plausible exemplars from arbitrary positions on untangled category manifolds (Goodfellow, 2016). Without elaborating the technical details here, this is the technique behind the “deepfakes” and other creative products of deep learning, such as photorealistic pictures of people who do not exist or photographs rendered flexibly in the style of famous artists. Returning to DiCarlo et al.’s paper-folding metaphor, if a system that has learned to “unfold” the manifolds for different categories could “re-fold” them in a similar manner, then each vector in an abstract category manifold can be remapped to its original perceptual representation with the appropriately specified values of its original nuisance parameters like pose, position, scale, and so on. Moreover, if we pick a novel location on an untangled manifold, and then refold the manifold using the this inverse procedure, we can produce photo-realistic exemplars with novel combinations of nuisance parameters that the network never even observed its training (Gatys et al., 2016).
这开始看起来更像康德(以及当代康德主义者,如劳伦斯·巴萨卢——见 Gauker (2011, 67))提供的抽象理论,他们认为一般类别表征可能由“综合规则”组成,允许人们生成一系列与抽象类别相对应的可能样本。然而,有了双向转换,这项任务可以在没有任何明确规则的情况下完成;对于迄今为止讨论的网络,转换是从经验中学习的,只使用领域通用的机制。如果我们在形而上学中克制住先验主义的放纵,这种观点在相关意义上仍然算作经验主义。这些生成能力反过来会为我们提供一种直接的方法,从抽象中生成先前观察到的或新颖的样本。当再次与辨别能力相结合时,这些生成的样本可以为抽象作为表征的能力奠定基础;先前观察到的或想象的样本可以用于演示联想推理的行为。
This begins to look more like the theory of abstraction provided by Kant (and contemporary Kantians like Lawrence Barsalou—see Gauker (2011, 67)), who suggested that general category representations might consist of “rules of synthesis” allowing one to generate a range of possible exemplars corresponding to an abstract category. Yet with bidirectional transformations in hand, the task might be performed without any explicit rules; and for the networks discussed so far, the transformations are learned from experience using only domain-general mechanisms. If we refrain from transcendentalist indulgences in our metaphysics, the view still counts as empiricist in the relevant sense. These generative capacities would in turn supply us with a straightforward way to generate previously-observed or novel exemplars from abstractions. When engaged again with discriminative capacities, these generated exemplars could ground a capacity for abstraction-as-representation; previously-observed or imagined exemplars could be deployed in acts of demonstrative associative reasoning.
我将提供这四种在样例和抽象之间双向流动形式的计算过程称为“转换抽象”。总而言之,为什么 DCNN 相对于缺乏这些特征的神经网络表现如此出色?在训练这些 DCNN 期间,通过增加必须区分的类别的流形之间的特征空间净距离,节点参数会收敛到能够最好地解决网络所训练的最广泛的分类和决策问题的转换序列。这种转换能力解释了 DCNN 如何能够识别椅子或三角形等类别中样例之间共享的抽象相似性。值得注意的是,它们不需要任何先天的表示原语或明确的定义即可做到这一点。此外,网络本身会发现要执行的正确转换系列;正如 Goodfellow (2016) 所说,“对单独参数化的卷积的输出进行池化 [允许网络] 学习哪些变换对它们来说是不变的”(2016,337),这适用于它们所训练的一系列分类任务。只要这些变换及其揭示的中间特征对识别其他类别有用(对识别椅子有用的变换可能与对识别桌子或床等其他物体有用的变换非常相似),网络也将享受对这些相关类别的加速学习,无需任何魔法。
I call the computational processes that provide for these four forms of bidirectional travel between exemplars and abstractions “transformational abstraction.” To summarize, why do DCNNs perform so well, relative to neural networks that lack these features? During training of these DCNNs, node parameters converge on the sequence of transformations that do the best job of solving the widest range of categorization and decision problems on which the network is being trained by increasing the net distance in feature space between the manifolds for the categories that must be discriminated. This transformational ability explains how DCNNs can recognize the abstract similarities shared amongst exemplars in a category like chair or triangle. Remarkably, no innate representational primitives or explicit definitions are required for them to do so. Moreover, the networks themselves discover the right series of transformations to perform; as Goodfellow (2016) put it, “[pooling] over the outputs of separately parametrized convolutions [allows the network] to learn which transformations to become invariant to” (2016, 337) for some range of categorization tasks on which they were trained. Insofar as those transformations and the intermediary features they reveal are useful for the recognition of other categories—the transformations useful for recognizing chairs may closely resemble those useful for recognizing other objects like tables or beds—the network will enjoy accelerated learning for those related categories as well, no magic required.
到目前为止,我已经论证了 DCNN 转换感知信号的方式与人类皮层获取至少处于常识抽象层次中段的类别表征的方式存在实质性的相似性。现在,我们可以回到引言中的难题,并探究这种转换抽象过程可能将人工智能体带到人类认知抽象连续体的多高层次。转换抽象是否仅适用于 Chollet 所建议的感知和直觉分类,还是可以对其进行扩充和扩展,以涵盖表征基础科学、逻辑和数学的更多理论类别表征?值得注意的是,如果我们将可能需要克服的干扰变化形式扩展到模型域的排列等事物,那么刚刚提供的转换抽象定义也适用于最抽象的属性,如基数或逻辑有效性。最近甚至有经验证据表明,受经验主义启发的 DNN 可以胜过利用先天规则和表示进行早期解析步骤的方法(Ding et al.,2020)。
Up to this point, I have argued that there is substantial similarity in the way DCNNs transform perceptual signals and the way human cortex acquires category representations that are at least mid-way up a commonsense hierarchy of abstraction. We are now in a position to return to the puzzle of the introduction and inquire how high up the continuum of abstraction in human cognition such a process of transformational abstraction might carry artificial agents. Is transformational abstraction only of use for perceptual and intuitive categorization as suggested by Chollet, or could it be augmented and extended to cover more theoretical category representations that characterize basic science, logic, and mathematics? Notably, the definition of transformational abstraction just provided also applies to the most abstract properties like cardinality or logical validity, if we extend the forms of nuisance variation that might need to be overcome to things like permutations of the model’s domain. There has even been recent empirical evidence to suggest that empiricist-inspired DNNs can outperform approaches that make use of innate rules and representation for early parsing steps (Ding et al., 2020).
然而,一个独立的 DCNN(或近乎经验主义的调整)能否发现或实现检测基数、逻辑有效性或其他高度抽象属性所需的全部变换范围,这是一个开放的经验问题。这是一项艰巨的任务,因为不清楚如何引导卷积和池化来评估集合或域的完整排列。即使 DCNN 能够从有限的样本中很好地推断出来,这些属性也需要以某种逻辑极限进行变换,如果没有明确制定关于必须评估的排列集的假设以进行完整评估,或者没有量化资源来描述这些集合,任何方法都很难实现这种极限。因此,深度越来越深、训练量越来越大的 DNN 可能能够以越来越大的程度近似这种形式抽象,但永远无法完全实现它们。可能需要将与这些资源相对应的其他组件添加到 DCNN 中,以便它们能够发现具有完全普遍性的数学或几何属性。正是这样的计算挑战可能为本土主义者呼吁特定领域的先天偏见和/或第 15.1 节中提到的核心知识阵营所研究的那种表征结构提供额外的灵感。希望前面的讨论能为这场辩论提供额外的概念焦点,并阐明它如何通过实证发现得到指导。
However, it is an open empirical question whether an unaided DCNN (or near empiricist tweak) could discover or implement the full range of transformations required to detect cardinality, logical validity, or other highly abstract properties. This is a daunting task because it is unclear how convolution and pooling could be bootstrapped to evaluate complete permutations of a set or domain. Even if DCNNs extrapolate well from a limited sample, these properties require transformation at a kind of logical limit that may be difficult for any method to achieve without explicit formulation of hypotheses regarding the set of permutations that must be evaluated for a complete assessment, or quantificational resources to describe such sets. Thus, it is likely that DNNs of ever-greater depth and more training might be able to approximate such formal abstractions to an ever-greater degree, but without ever completely achieving them. Additional components corresponding to these resources might need to be added to DCNNs for them to discover mathematical or geometric properties in their full generality. Just such computational challenges may provide extra inspiration for the nativists’ plea for domain-specific innate biases and/or representational structures of the sort studied by the core knowledge camp mentioned in section 15.1. Hopefully, the preceding discussion provides this debate with additional conceptual focus and clarifies the ways in which it might be informed by empirical discoveries.
另一方面,转换抽象的潜力似乎已经超出了严格的感知范围,扩展到任何存在干扰变化源的领域,这些干扰变化源可以映射到系统的几何维度,并通过有限量的卷积和池化来克服。例如,围棋中的棋盘布局不仅限于视觉或触觉模态,它们以符号形式提供给 AlphaGo;因此,将 AlphaGo 学到的模式称为仅仅是感知的似乎不合适。一些批评者担心,这会使 AlphaGo 的成就不那么令人印象深刻,因为它的输入是部分预先消化的(参见 Marcus (2018b));但它也表明,DCNN 实现的转换类型并不局限于以视觉或听觉模态传递的信息。此外,AlphaGo 能够成功识别出各种棋盘范围的抽象,并因此击败人类大师,这源于其 DCNN 的转换能力,即识别棋盘布局在旋转、反射和错位过程中的细微不变性。就像识别自然图像中的特征一样,围棋策略需要具有抗干扰能力,因为与游戏相关的抽象概念(如“影响力”、“联系”和“稳定性”)在空间位置的旋转和细微移动中基本能够得以保留 —— 因此,这似乎是 DCNN 的一种相当通用的领域能力,超越了感知领域。
On the other hand, it seems that transformational abstraction’s potential already exceeds the strictly perceptual, extending across any domain with sources of nuisance variation that could be mapped to systematic geometric dimensions and overcome by finite amounts of convolution and pooling. Board configurations in Go, for example, are not limited to visual or tactile modalities, and they were provided to AlphaGo in symbolic form; so it seems inappropriate to call the patterns learned by AlphaGo merely perceptual. Some critics worry that this renders AlphaGo’s achievement less impressive since its input was partially pre-digested (c.f. Marcus (2018b)); but it also shows that the kind of transformations enabled by DCNNs are not narrowly limited to information vehicled in visual or auditory sense modalities. Moreover, AlphaGo’s success at recognizing the kinds of board-wide abstractions that allows it to defeat human grandmasters derives from its DCNNs’ transformational ability to recognize subtle invariances in board configurations across rotations, reflections, and dislocations. Just as with recognizing features in natural images, Go strategies need to be nuisance-tolerant, for game-relevant abstractions like “influence,” “connection,” and “stability” are largely preserved across rotations and small shifts in spatial location—and so this appears to be a quite domain-general capacity of DCNNs that exceeds perceptual domains.
此外,我们可能会质疑普通人是否真的评估了某个领域的完整排列或变换,以展示他们在涉及数字、对象、代理或原因等类别的推理方面可靠的能力。众所周知,人类认知因这些类别的错误而受到困扰;我们无法区分相关性和因果关系,容易受到感知错觉和关于物体及其物理属性的系统性错误的影响(Kaiser 等人,1986 年;Kubricht 等人,2017 年),受方程式中数学上不相关的方面的影响,例如符号的物理间距以及问题是在社会领域还是抽象领域表达的(Fiddick 等人,2000 年;Landy 等人,2014 年),并且容易将自然界的大部分拟人化(Epley 等人,2007 年)。这些偏见和持续的错误表明,即使在本土主义者认为是他们自然支配的人类心理活动领域,人类认知也依赖于启发式和感知支架。此外,我们自己的认知机制是出了名的不透明,而内省是出了名的不可靠(Nisbett 和 Ross,1980 年)。动机推理倾向于引导我们——甚至特别是专业哲学家和心理学家,经验主义者 William James 称之为“心理学家的谬误”(Ashworth,2009 年)——假设人类实验对象以与撰写有关他们的书籍和文章的学者相同的明确理论和逻辑严谨性来对待自然或社会世界。我将这种错误形式称为“人为虚构”(Buckner,2013)——即带着玫瑰色眼镜看待人类认知的方式,这种方式会导致我们设定的拥有智力、理性、抽象或其他心理属性的标准甚至超过人类的平均表现。
Furthermore, we might question whether the average human really evaluates complete permutations or transformations of a domain to exhibit what facility they reliably do with inferences involving categories such as number, object, agent, or cause. Human cognition is notoriously plagued by errors with respect to these categories; we are unreliable at distinguishing correlation from causation, subject to perceptual illusions and systematic errors about objects and their physical properties (Kaiser et al., 1986; Kubricht et al., 2017), influenced by mathematically irrelevant aspects of equations such as physical spacing of symbols and whether the problem is phrased in a social domain or in the abstract (Fiddick et al., 2000; Landy et al., 2014), and prone to anthropomorphizing much of the natural world (Epley et al., 2007). These biases and persistent errors suggest that human cognition also relies on heuristics and perceptual scaffolding even in areas of human mental activity that the nativists take to be their natural dominions. Moreover, our own cognitive mechanisms are notoriously opaque, and introspection is famously unreliable (Nisbett and Ross, 1980). Motivated reasoning has a tendency to lead us—and perhaps even especially professional philosophers and psychologists, in what empiricist William James dubbed the “psychologist’s fallacy” (Ashworth, 2009)—to presume that human experimental subjects approach the natural or social world with the same level of explicit theorizing and logical rigor as the academics writing the books and articles about them. This is a form of error I have elsewhere dubbed “anthropofabulation” (Buckner, 2013)—a way of looking at human cognition with rose-tinted glasses that can cause us to set criteria for the possession of intelligence, rationality, abstraction, or other mental properties to a level that exceeds even average human performance.
牢记这一警告,考虑一下我们提出的转换抽象机制如何很好地回答了我们在第 15.2 节中提出的抽象减法和抽象合成的挑战。特别是,我们现在是否对 DCNN 或大脑如何确定在构建更抽象的复合结构时要减去特定特征的哪些方面或要合成哪些简单特征有了很好的答案?简短的回答是,DCNN 在训练过程中通过反复试验来确定这一点;没有预言机告诉它要探索的正确组合;它只是通过梯度下降学习逐渐调整链接权重,保留那些能可靠地降低其整体误差函数的链接权重,丢弃那些不能。这个答案引发了本土主义者对深度学习的常见批评之一——它太“数据饥渴”,需要比人类学习者多得多的训练样本来解决这些问题(Lake 等人,2017 年;Marcus,2018b 年;Mitchell,2019 年)。本土主义者在这一点上通常会向 DNN 建模者抛出一个难题,即他们的深度学习在生物学上可能是合理的,但前提是它被视为模拟了数百万年进化探索的整个搜索过程,而不是可以归因于单个人类认知发展的更简短的学习过程。
With this warning in mind, consider how well the proposed mechanism for transformational abstraction answers the challenges we posed above to abstraction-as-subtraction and abstraction-as-composition in section 15.2. In particular, do we now have good answers as to how a DCNN or brain determines which aspects of a particular to subtract, or which simple features to compose, in building more abstract composites? The short answer is that DCNNs determine this through trial-and-error in the process of training; there is no oracle that tells it the right combinations to explore; it simply gradually adjusts link weights through gradient descent learning, retaining those that produce a reliable reduction in its overall error function and discarding those that do not. This answer prompts one of the common criticisms that nativists launch at deep learning—that it is too “data hungry,” requiring far more training samples to solve such problems than do human learners (Lake et al., 2017; Marcus, 2018b; Mitchell, 2019). A bone that nativists typically throw to DNN modelers at this point is the proposal that their deep learning may be biologically plausible, but only if it is seen to model the entire search process explored by millions of years of evolution, rather than the much briefer learning process that could be attributed to the cognitive development of a single human.
精明的经验主义者应该对这种微不足道的说法嗤之以鼻(Botvinick 等人,2017 年),而是坚持认为本土主义者对训练集规模的担忧必须经过人类虚构的检验。这种本土主义者的担忧只是“刺激贫乏”论证的翻版,该论证几十年来一直低估刺激的丰富性(Reali 和 Christiansen,2005 年)。经验主义者的反驳应该始终是,要么更仔细地审查学习环境,询问本土主义者是否真的提供了人类不需要感知或经验支架来获得一般概念的行为学证据,要么提供证据表明 DNN 在获得生物学上合理的经验时也可能获得这些表征。最近,人类心理学和机器学习研究都表明,我们低估并忽视了人类皮层学习中额外训练暴露的重要来源。首先,我们需要谨慎计算样本数量;直接在连续的视频帧而不是静止图像上训练 DCNN 支持了这样一种观点,即同一物体上的数千个不同有利位置可以被视为同一类别的许多不同样本,用于训练(Lotter 等人,2016 年;Luc 等人,2017 年;Orhan 等人,2020 年)。此外,神经科学研究表明,人类会重播过去的训练情节,甚至会产生新的想象体验,这些体验可用于在记忆巩固期间进行额外训练,记忆巩固发生在初始训练情节后的数月和数年睡眠和白日梦期间(Blundell 等人,2016 年;Gluck 和 Myers,2001 年;Gupta 等人,2010 年)。仅结合这两种额外训练来源就可以将人类经历的训练暴露量增加几个数量级,而根据这些原则构建的 DCNN 系统已经大大提高了其训练中的样本效率。
Savvy empiricists should turn up their noses at this meager offering (Botvinick et al., 2017), insisting instead that the nativist’s concerns about the size of the training set must be inspected for anthropofabulation. This nativist worry is just a reincarnation of a “poverty of the stimulus” argument that has underestimated the richness of the stimulus for decades (Reali and Christiansen, 2005). The empiricist rebuttal should always be to either scrutinize the learning environment more carefully, asking the nativists whether they have really provided the ethological evidence that humans require no perceptual or experiential scaffolding to acquire their general concepts, or provide evidence that DNNs might also acquire these representations when provided with biologically-plausible experience. Recently, both human psychology and machine learning research has suggested that we have undercounted and overlooked important sources of additional training exposures in human cortical learning. For one, we need to be careful how we count exemplars; training DCNNs directly on successive video frames rather than still images has supported the idea that thousands of different vantage points on the same object can be treated as many different exemplars of the same category for the purposes of training (Lotter et al., 2016; Luc et al., 2017; Orhan et al., 2020). Moreover, neuroscientific research has suggested that humans replay past training episodes and even generate novel imagined experiences that can be used for additional training during memory consolidation, which occurs during sleep and daydreaming for months and years after initial training episodes (Blundell et al., 2016; Gluck and Myers, 2001; Gupta et al., 2010). Combining just these two sources of additional training can increase the amount of training exposures that humans experience by several orders of magnitude, and DCNN systems built according to these principles have already dramatically increased their sample efficiency in training.
为了推动这场辩论,我建议双方就迄今为止所学到的东西达成一致:DCNN 和其他 DNN 架构实现了具有哲学和心理学意义的抽象形式。这些过程使这些系统能够从输入数据中学习重要的抽象,并可能解释它们的分类和决策过程如何能够非意外地很好地推广到新的自然数据。未来的哲学和经验注意力应该集中在将这些转换抽象形式与完全抽象不变性联系起来,并将其应用于人类优于基于 DNN 的代理的其余领域:因果和前瞻性推理、逻辑和数学证明以及高级社会认知问题。剩下的问题——无论是在哲学还是认知科学领域,对于经验主义者还是本土主义者来说,仍然是一个悬而未决的问题——是这些形式的转换抽象是否可以通过正确的经验或领域通用偏见来引导,从而提取因果、模态和元表征关系,这些关系被认为是驱动人类这些形式的高级认知的因素 (Penn et al., 2008)。经验主义者不应该满足于“万福玛利亚”传球给越来越大的数据集和越来越深的网络,理性主义者也不能满足于指出当前架构在解决甚至超出普通成年人能力的问题时有时会出现戏剧性的失败。
To move this debate forward, I suggest that both sides should agree upon what has been learned so far: there are philosophically and psychologically significant forms of abstraction that are implemented by DCNNs and other DNN architectures. These processes allow these systems to learn significant abstractions from input data and possibly explain how their categorization and decision processes can non-accidentally generalize well to novel natural data. Future philosophical and empirical attention should be focused on relating these forms of transformational abstraction to full abstraction-as-invariance and its use in the remaining areas where humans outperform DNN-based agents: on problems of causal and prospective reasoning, logical and mathematical proof, and advanced social cognition. The remaining question—which remains an open one for both empiricists and nativists, in both philosophy and cognitive science—is whether these forms of transformational abstraction can be bootstrapped with the right experience or domain-general biases to also extract the causal, modal, and meta-representational relations thought to drive these forms of higher cognition in humans (Penn et al., 2008). Empiricists should not rest content with “hail Mary” passes to ever-larger datasets and ever-deeper networks, and rationalists must not rest content by pointing to the sometimes dramatic failures of current architectures on problems that exceed the abilities of even average adult humans.
这场战斗的胜负取决于我们能否更严格地审视人类儿童赖以建立自身核心知识系统的发展支架。为此,我们必须利用我们掌握的最先进的数据收集工具进行最高质量的调查。我们必须抛开不可靠的直觉和关于人类发展经验贫乏或丰富的常识性假设,并从婴儿在自然学习互动过程中佩戴的头戴式摄像机等来源创建新的多模态数据集(例如,SAYCam,这是一个大型纵向数据集,由 6 至 32 个月大的婴儿在从超过 200,000 个自然语音单词中学习时佩戴的头戴式摄像机记录下来),我们现在才开始使用这些数据集来训练新的 DNN(Orhan 等人,2020 年;Sullivan 等人,2020 年;Vong 和 Lake,2020 年;Yoshida 和 Smith,2008 年)。我们还必须更有创意地思考大脑如何重用旧经验并生成新的模拟经验,以便更有效地利用类似人类的经验量,使用领域通用机制进行进一步的训练和规划(Hassabis 等人,2017 年)。只有抵制人类虚构的微妙影响,创造有哲学基础、有动物行为学知识和神经科学启发的工程实验,我们才能继续在哲学和认知科学中最古老的问题之一上取得进展(Buckner,2020 年;Rahwan 等人,2019 年)。
This battle will be won or lost by more rigorously examining the developmental scaffolding that human children rely on to bootstrap their own core knowledge systems. To do so, we must draw upon the most sophisticated data collection tools at our disposal to conduct the highest-quality investigations. We must set aside unreliable intuitions and commonsense assumptions about the poverty or richness of human developmental experience and create new multi-modal datasets from sources such as headcams worn by infants during naturalistic learning interactions (e.g., SAYCam, a large, longitudinal dataset recorded from head-mounted cameras worn by infants aged six to thirty-two months as they learned from over 200,000 word-utterances in naturalistic speech), which we are only now beginning to use to train new DNNs (Orhan et al., 2020; Sullivan et al., 2020; Vong and Lake, 2020; Yoshida and Smith, 2008). We must also think more creatively about ways that brains can reuse old experiences and generate novel simulated ones to make more efficient use of human-like amounts of experience for further rounds of training and planning using domain-general machinery (Hassabis et al., 2017). Only by resisting the subtle influence of anthropofabulation and creating philosophically grounded, ethologically informed, and neuroscientifically inspired engineering experiments can we continue to make progress on one of the oldest questions in philosophy and cognitive science (Buckner, 2020; Rahwan et al., 2019).
1.我说的“深度神经网络”是指由输入和输出之间插入的三个以上隐藏节点层组成的人工神经网络。我说的“有用模型”是指“以一种允许我们成功进行替代推理的方式来描述目标(在本例中是心脑中的抽象),例如通过对人类抽象做出新颖而真实的经验预测和/或通过解释人类如何抽象,这些预测和解释基于模型属性与心脑或大脑相关属性之间的相似性。虽然这是本文所捍卫的观点,但值得注意的是,这种科学建模的“表征相似性”观点在科学哲学中已不再占主导地位,即使 DNN 不能准确表征心脑中的抽象过程,在其他意义上,它们可能仍然是良好的科学模型(Downes,2011 年)。阐明此建模练习所针对的抽象概念是本章的主要任务。
1. By a “deep neural network,” I mean an artificial neural network that consists of more than three hidden layers of nodes interposed between input and output. By “usefully model,” I mean “depict in a way that allows us to engage in successful surrogative reasoning about the target (in this case, abstraction in the mind-brain),” such as by making novel and true empirical predictions about human abstraction and/or by explaining how humans abstract, where those predictions and explanations are based on similarities between properties of the model and relevant properties of the mind or brain. Though this is the view defended here, it is worth noting that this “representational similarity” view of scientific modeling is no longer hegemonic in philosophy of science, and even if DNNs should fail to be an accurate representation of abstractive processes in the mind-brain, there are other senses in which they might still be good scientific models (Downes, 2011). Articulating the notion of abstraction that is targeted by this modeling exercise is the main task of this chapter.
2。 我们可能会争论洛克是否是一位辩护经验主义者,但他肯定是一位起源经验主义者(Odegard,1965)。
2. We might debate whether Locke is a justification empiricist, but he is definitely an origin empiricist (Odegard, 1965).
3.基于注意力机制的 Transformer 架构(如 GPT-3 和 BERT)开始获得最有前途的通用架构的美誉。例如,尽管它们以前主要在语言建模领域被考虑,但它们在最近的突破中发挥了重要作用,例如 AlphaFold 2 在预测蛋白质折叠过程方面取得了显著进步(Jumper 等人,2020 年)。
3. Attention-based transformer architectures like GPT-3 and BERT are starting to acquire the reputation as the most promising all-purpose architecture. For example, though they were previously considered mostly in the realm of language modeling, they have featured prominently in recent breakthroughs such as AlphaFold 2’s dramatic improvement in predicting the course of protein folds (Jumper et al., 2020).
4.它们也几乎总是涉及正则化技术,但我不会在这里对此发表更多评论;有关详细信息,请参阅 Buckner (2019a)。
4. They also almost always involve regularization techniques, but I will not comment more on that here; for details, see Buckner (2019a).
5.值得注意的是,非常成功的基于注意力的 Transformer 架构通常不使用池化。然而,其关键用途是使用“注意力”迭代地关注特别相关的输入子集,这可以看作是通过其他方式实现抽象减法;有关更多详细信息,请参阅 Lindsay (2020)。
5. Notably, the highly successful attention-based transformer architecture does not typically use pooling. However, its key use of “attention” to iteratively focus on subsets of especially relevant input may be seen to achieve abstraction-as-subtraction through other means; for more details, see Lindsay (2020).
朱莉娅·哈斯
Julia Haas
2023
2023
约翰·豪格兰 (John Haugeland) 在《心智设计》的原始介绍中指出,“心智设计的‘实验’往往是努力构建某种东西并使其发挥作用,而不是观察和分析已经存在的事物”(本卷第 2 章)。但是,如果这样的心智设计实验以很少有人能预测的方式和程度取得成功,会发生什么?更重要的是,当这些成功挑战我们理解心智的核心特征时,我们该如何看待这些成功的含义?
In his original introduction to Mind Design, John Haugeland observes that “an ‘experiment’ in mind design is more often an effort to build something and make it work, than to observe and analyze what already exists” (Chapter 2, this volume). But what happens when such an experiment in mind design succeeds in a way and to a degree that few could have predicted? And more to the point, how do we take on the implications of such successes when they challenge central features of how we understand the mind?
这些就是我们在强化学习发展方面所处的情况。在过去的二十五年里,强化学习对人工智能的发展产生了巨大的影响,并成为所谓“决策科学”——计算神经科学、神经科学、心理学、精神病学和经济学——进步的主要推动力。但是,即使我们继续推进奖励最大化的概念作为人工智能问题的一般解决方案(Silver,2015),我们还没有完全接受强化学习以及伴随的奖励预测假设对我们思维概念的影响。也就是说,我们仍然认为思维是某种形式的思考机器(例如,“思考,智力”,Haugeland,本卷第 2 章),而这种思考最好被理解为某种类型的计算——普遍包括神经网络、深度学习、遗传算法等。
These are the circumstances that we find ourselves in with respect to developments in reinforcement learning. Over the past twenty-five years, reinforcement learning has had a tremendous impact on the development of artificial intelligence and has been a major driver in advancements in the so-called ‘decision sciences’—computational neuroscience, neuroscience, psychology, psychiatry, and economics. But even as we continue to advance the notion of reward maximization as a general solution to the problem of artificial intelligence (Silver, 2015), we have not yet embraced the full implications of reinforcement learning, together with the accompanying reward-prediction hypothesis, for our conceptions of the mind. That is, we continue to think of the mind as some form of a thinking machine (e.g.,“thinking, intellect,” Haugeland, Chapter 2, this volume), where such thinking is best understood as some type of computation—ecumenically including neural networks, deep learning, genetic algorithms, and so on.
我认为强化学习的成功和贡献促使我们以新的眼光看待思维,即认识到思维本质上是评价性的。这一论点有较弱和较强的版本。
I propose that the successes and contributions of reinforcement learning urge us to see the mind in a new light, namely, to recognize that the mind is fundamentally evaluative in nature. There are weaker and stronger versions of this thesis.
我在此采用的较弱版本认为,从根本上讲,思维的作用是评估事态是好是坏。这个版本本质上是附加的:它认为,除了对描述性事实的表示进行计算之外,思维还会对这些事实的表示进行计算,看它们是好是坏。1但即使只是认识到这个迄今为止缺失的拼图碎片,也会改变我们对认知体验许多核心方面的理解。
The weaker version, which I commit to here, proposes that the mind is, at a fundamental level, in the business of evaluating states of affairs as better or worse. This version is additive in nature: it says that, in addition to performing computations over representations of descriptive matters of fact, the mind also performs computations over representations of those facts as better or worse.1 But even merely recognizing this heretofore missing piece of the puzzle transforms our understanding of many central aspects of our cognitive experience.
我探索过但最终不认同的更强的版本,它提出的是修正性的而非附加性的主张:它认为心智本质上是评价性的。也就是说,心智的评价过程在概念上先于其感知、认知或运动过程。从这个意义上说,更强的论点是一种理解心智的宏大统一理论。值得注意的是,更强的版本与所谓的“奖励就足够”假设相关但又不同,后者认为奖励最大化足以“驱动表现出自然和人工智能中研究的大多数(如果不是全部)能力的行为”(Silver 等人,2021,1)。
The stronger version, which I explore but ultimately don’t subscribe to, makes a revisionary rather than an additive claim: it proposes that the mind is at bottom evaluative in nature. This is to say that the mind’s evaluative processes are conceptually prior to its perceptual, cognitive, or motor processes. In this sense, the stronger thesis is a type of grand unifying theory for understanding the mind. Notably, the stronger version is related to but distinct from the so-called ‘reward is enough’ hypothesis, which suggests that reward maximization is sufficient to “drive behavior that exhibits most if not all abilities that are studied in natural and artificial intelligence” (Silver et al., 2021, 1).
即使没有更强的版本,强化学习也向我们指出了这样的观点:作为生物体,我们不仅不断地体验着世界,而且体验着它的好坏。正如 Haugeland (1979, 619) 所说,传统计算机的问题在于它们“根本不在乎”。Montague (2006, 19) 类似地认为,计算机(我们更传统地认为它们)和大脑之间的主要区别在于,后者使用进化的、高效的计算,这种计算“关心——或者更准确地说,有办法关心”。在我看来,这些“关心”或“关心”的概念基本上是正确的:思想会根据某些目标进行评估,即,它们“关心”事情如何朝着这些目标发展,无论这些目标像生存一样重要,还是像喝咖啡一样平凡。
Even without the stronger version, reinforcement learning points us to the idea that, as living organisms, we not only continually experience the world, but experience it as better and worse. As Haugeland (1979, 619) puts it, the problem with classical computers is that they “don’t give a damn.” Montague (2006, 19) similarly suggests that the central difference between computers (as we have more traditionally conceived of them) and brains is that the latter use evolved, efficient computations that “care—or more precisely, [that] have a way to care.” In my view, these notions of ‘giving a damn’ or ‘caring’ are basically right: minds assess with respect to some goals, i.e., they ‘care’ about how things are going with respect to those goals, be they as central as survival or as mundane as getting coffee.
不过,我们需要一种更系统的方式来弄清楚这到底意味着什么。此外,如果我们确实以这种方式(即评价性地)体验世界,那么这将对理解我们的许多认知能力如何发挥作用具有重要意义,例如,为什么感知和注意力会如此选择;同样,为什么这些能力会如此崩溃,例如,重度抑郁症可能既涉及对奖励的初级敏感性的降低,又涉及个人从奖励中学习的能力的降低(Huys 等人,2013 年)。我在这里要做的工作就是绘制这幅图景。
Still, we need a much more systematic way of working out of what this actually means. Moreover, if we do in fact experience the world in this way—that is, evaluatively—then this will have important implications for understanding how many of our cognitive capacities function, e.g., why perception and attention select as they do; and, equally, why these capacities break down as they do, e.g., how Major Depressive Disorder may involve both a reduction in the primary sensitivity to rewards and an individual’s reduced ability to learn from reward (Huys et al., 2013). Developing this picture is the work I aim to do here.
我分阶段阐述了我的论点。为了准确起见,我对强化学习的性质及其在我们这样的生物思维中的实例化做了一些假设。我在第 16.2 节中概述了这些假设以及它们与其他强化学习版本的关系。然后,我简要概述了一些经验证据,这些证据表明强化学习范式捕捉到了我们这样的生物思维的一些重要特征。
I build my argument out over stages. For precision, I make several assumptions about the nature of reinforcement learning and its instantiation in minds like ours. I sketch these assumptions, together with their relationship to other versions of reinforcement learning, in Section 16.2. I then briefly survey some of the empirical evidence suggesting that the reinforcement learning paradigm captures something important about biological minds like ours.
在第 16.3 节中,我更详细地说明了“重要的事情”是什么。我通过描述心理中评价的性质、捍卫评价作为指导选择的功能以及提供证据证明评价作为选择在广泛的“低级”和“高级”人类心理能力中普遍存在。
In Section 16.3, I get more specific about what that ‘something important’ is. I do so by characterizing the nature of valuation in the mind, defending the function of valuation as guiding selection and providing evidence for the ubiquity of valuation as selection across a wide range of ‘low-’ and ‘high-level’ human psychological capacities.
在第 16.4 节中,我为评价论题的弱化版本辩护。我概述了我们可能从严格的“思考”思维和思考、评价思维中期待什么。我认为,我们在各种认知能力中都可以找到大量支持后者的证据。
In Section 16.4, I defend the weaker version of the evaluative thesis. I sketch what we might expect from a strictly ‘thinking’ mind on the one hand, and from a thinking, evaluative mind on the other. I suggest that we find plenty of evidence for the latter in a variety of cognitive capacities.
在第 16.5 节中,我考虑了更有力的论点,并规划了支持该论点的论据。我认为这是一个值得牢记的论点,尤其是在我们继续在人工智能方面取得进步的情况下。尽管如此,我认为我们目前缺乏全面支持该论点的必要证据,并提出了一些确保该论点继续前进的挑战。
In Section 16.5, I consider the stronger thesis, mapping out how an argument for it might go. I suggest it is a thesis well worth bearing in mind, particularly as we continue to make advancements in artificial intelligence. Nonetheless, I suggest that we presently lack the necessary evidence to subscribe to it wholesale and raise some challenges for securing it going forward.
在第 16.6 节中,我简要总结了 Haugeland 所说的关于人工智能的常见抱怨。根据 Haugeland 的说法,这种抱怨表明人工智能“很少关注感觉、情绪、自我、想象力、情绪、意识”(本书第 33 页)。我展示了如何通过采用评估性解释,我们不仅可以阐明像我们这样的思维的核心方面,而且同样可以诉诸强大的计算框架来设计 Haugeland 提到的许多(但不是全部)人工智能体的特征。
In Section 16.6, I briefly conclude by addressing what Haugeland calls the common complaint regarding artificial intelligence. According to Haugeland, the complaint suggests that artificial intelligence “pays scant attention to feelings, emotions, ego, imagination, moods, consciousness” (this volume, p. 33). I show how by adopting an evaluative account, we can not only illuminate core aspects of minds like ours, but equally appeal to powerful, computational frameworks to design many (though not all) of the features Haugeland refers to into artificial agents.
首先,让我们看一下争论的焦点,即强化学习的基本概述。
To start, let’s look at the narrow end of the argumentative wedge, namely, with a basic sketch of reinforcement learning.
我们可以将强化学习视为一个研究问题、一个研究项目和一套计算工具。作为一个研究问题(有时称为“学习问题”),强化学习询问代理如何通过从与环境的交互中学习来优化其行为。例如,一只小鸻如何通过在环境中跳跃来学习环境轮廓?或者,一个刚到伦敦的人如何仅凭地图和一些反复试验就能找到路?作为一个研究项目,强化学习是指计算机科学的一个分支,以及相关的跨学科方法,它分析这个问题的形式化版本并为其开发计算解决方案(Dayan 和 Abbott,2001;Glimcher 和 Fehr,2013)。最后,强化学习方法是旨在解决上述学习问题的计算算法套件(Sutton 和 Barto,2018a)。
We can think of reinforcement learning as a research question, as a research program, and as a set of computational tools. As a research question, sometimes called the ‘learning problem,’ reinforcement learning asks how an agent can optimize its behavior by learning from interactions with its environment. For example, how does a baby plover learn the contours of its environment simply by hopping around in it? Or again, how does a newcomer to London find her way around, just by using a map and a bit of trial-and-error? As a research program, reinforcement learning refers to a branch of computer science, together with associated interdisciplinary approaches, that analyzes formal versions of this question and develops computational solutions to it (Dayan and Abbott, 2001; Glimcher and Fehr, 2013). Finally, reinforcement learning methods are the suites of computational algorithms that aim to solve the aforementioned learning problem (Sutton and Barto, 2018a).
作为一个研究项目,强化学习框架做出了某些基础和技术假设,特定版本的框架会坚持某些假设,同时暂停或放宽其他假设。在这里,我根据机器学习和计算神经科学中的假设,勾勒出所谓的“强化学习和决策”(RLDM)框架。2具体而言,除了假设通用框架的许多更基本的特征外,这个版本还假设强化学习在某种程度上在生物有机体的头脑中有意义地实例化,并对指定生物系统中奖励来自何处的问题采取了特殊但最低限度的观点。在整个过程中,记住这只是通用框架众多变体中的一个,尽管可能在哲学上特别有用。
As a research program, the reinforcement learning framework makes certain foundational and technical assumptions, with specific versions of the framework committing to some assumptions while suspending or relaxing others. Here, I sketch what I call the ‘reinforcement learning and decision-making’ (RLDM) framework, drawing on assumptions made in both machine learning and computational neuroscience.2 Specifically, in addition to assuming many of the somewhat more basic features of the general framework, this version assumes that reinforcement learning is to some degree meaningfully instantiated in the minds of biological organisms, and takes a particular if minimal view regarding the problem of specifying where rewards come from in biological systems. Throughout, it will be useful to remember that this is just one variant of the general framework among many—though perhaps one that is particularly philosophically useful.
让我们从基本要素开始。在强化学习框架中,我们有一个代理和一个环境。代理是学习者或决策者,它会在其环境中选择不同的动作,其中动作可以理解为“我们想要学习如何做出的任何决定”,包括心理动作(Sutton 和 Barto,2018a,50)。环境是指代理“外部”的一切,代理不能任意改变它,而是与它交互。(例如,在许多情况下,甚至代理身体的某些部分也被视为环境的一部分。)代理和环境相互作用,代理会从环境中获得感官信息,代理会在环境中的不同动作中进行选择(选择所谓的“状态-动作对”)。然后环境会受到这些动作的影响,并且该过程(通常)会迭代。值得注意的是,代理可能无法观察完整的环境,并且可能对环境的动态没有任何先验知识。此外,代理可以(但绝不需要)构建环境模型以便选择动作并从中学习。
Let’s start with the basic ingredients. In a reinforcement learning framework, we have an agent and an environment. The agent is the learner or decision-maker in question, and it selects different actions in its environment, where actions can be understood as “any decisions we want to learn how to make,” including mental actions (Sutton and Barto, 2018a, 50). The environment refers to everything ‘outside’ of the agent, which the agent cannot arbitrarily change but rather with which the agent interacts. (For example, in many cases, even parts of the agent’s body are considered to be a part of the environment.) The agent and the environment interact in the sense that the agent is presented with sensory information from the environment, and the agent chooses among different actions within the environment (picking what is called a ‘state-action pair’). The environment is then affected by these actions, and the process is (usually) iterated. Notably, the agent may not be able to observe the complete environment and may have no prior knowledge of the environment’s dynamics. In addition, the agent may, but by no means needs to, build a model of the environment in order to choose actions in and learn from it.
强化学习框架的一个显著特征是奖励的作用。粗略地说,在强化学习中,环境中的代理的目标是最大化其随时间推移获得的累积奖励,其中奖励从环境传递给代理。在他们颇具影响力的文本中,Sutton 和 Barto 将此称为奖励假设,并指出,“我们所说的 [代理的] 目标和目的都可以被认为是最大化接收标量信号累积和的期望值(称为奖励)”(Sutton 和 Barto,2018a,53)。也就是说,代理的目标是最大化其在世界中行动时的奖励收益。该目标的特点是为每个状态(或在每个状态下采取的每个行动)分配一定数量的内在可取性。这种内在可取性被称为奖励。3
A distinguishing feature of the reinforcement learning framework is the role of reward. Roughly, in reinforcement learning, the agent’s objective in the environment is to maximize the cumulative reward it receives over time, where rewards are passed from the environment to the agent. In their influential text, Sutton and Barto call this framing the reward hypothesis, specifying, “all of what we mean by [an agent’s] goals and purposes can be well thought of as the maximization of expected value of the cumulative sum of a received scalar signal (called reward)” (Sutton and Barto, 2018a, 53). That is, the agent’s objective is to maximize its yield of reward as it acts in the world. This objective is characterized by assigning a quantity of intrinsic desirability to each state (or to taking each action in each state). This intrinsic desirability is known as the reward.3
这种赋予每个状态(或在每个状态下采取的每个行动)的内在可取性,或奖励,可以与价值的概念形成对比,价值捕获了在某个行动策略下与每个状态(或每个状态中的每个行动)相关的未来奖励的预期、折扣总和。我们可以使用改编自 Silver (2015) 的示例进一步阐明奖励和价值之间的区别。想象一个代理试图寻找一扇门。当第一次到达门口时,代理会从环境中获得奖励。但这种奖励也可用于评估各个状态预计的相对好坏(有价值),以至于在某个行动策略的条件下,它们会通向门,从而获得奖励。因此,代理与环境的持续交互使其能够根据某个策略不断修改归因于给定状态或状态-行动对的值,根据需要升级或降级。这使得代理能够在最合适的状态下学习最合适的动作,以在一定的策略条件下最大化随时间推移的累积奖励,尽管状态(或状态-动作对)可能具有很高的价值但本质上并不值得(即没有奖励)。
This intrinsic desirability assigned to each state (or taking each action in each state), or reward, can be contrasted with the notion of value, which captures the expected, discounted, sum of future reward associated with each state (or each action in each state), conditional on a certain policy of action. We can elucidate the distinction between reward and value further using an example adapted from Silver (2015). Imagine an agent trying to find a door. Upon arriving at the door for the first time, the agent receives a reward from the environment. But this reward can also be used to assess how relatively good (valuable) individual states are expected to be to the extent that, conditional on a certain action policy, they lead to the door and hence the reward. Hence, an agent’s ongoing interactions with its environment enable it to continually revise the value attributed to a given state or state-action pair conditional on a certain policy, upgrading or downgrading as needed. This enables the agent to learn the most appropriate actions in the most appropriate states to maximize cumulative reward over time, conditional on a certain policy, in spite of the fact that states (or state-action pairs) can be of high value without being intrinsically worthwhile (i.e., rewarding).
我们可以以煮咖啡和喝咖啡为例,帮助说明高预期值但技术上不具回报的状态之间的区别。虽然喝一杯咖啡本身可能具有内在价值(回报),而研磨咖啡豆几乎肯定不具回报,但研磨咖啡的状态-动作对仍然与预期值相关,因为它取决于某种策略,是喝咖啡过程中的必要步骤或状态-动作对。
We can take the example of making and having coffee to help illustrate the difference between a state of high expected value that is nonetheless not technically rewarding. Although only drinking a cup of coffee itself may be intrinsically worthwhile (rewarding), and the grinding of the beans almost certainly is not, the state-action pair of grinding the coffee is nonetheless associated with expected value, as it is, conditional on a certain policy, a necessary step or state-action pair on the way to having the coffee.
有价值状态和奖励状态之间的区别在一定程度上有助于解释为什么为了让代理在其中采取适当的行动,环境中的每个状态并不是都需要直接奖励。
The distinction between valuable and rewarding states partly helps explain why not every state in an environment needs to be directly rewarding in order for an agent to act appropriately within it.
作为机器学习的一个分支,强化学习以计算方式代表了上述概念特征。强化学习算法不计其数,每种算法都有独特的计算特性。例如,时间差分学习算法(TD) 代表了一种计算效率高的方法,可以预测未来的奖励。随着时间的推移,改进预测的一种方法是预测实际结果,比较两者之间的差异(或误差),然后更新导致初始预测的估计值。
As a branch of machine learning, reinforcement learning represents the foregoing conceptual features in computational terms. There are countless reinforcement learning algorithms, each with a distinctive computational profile. For example, the temporal-difference learning algorithm (TD) represents a computationally efficient way of making predictions about reward in the future. One way to improve predictions over time is to make a prediction about an actual outcome, compare the difference (or error) between the two, and then update the estimates that led to the initial prediction.
借用 Sutton (1988, 10) 的一个例子,假设您是气候单调的天气预报员,每周负责预测下周六的降雨概率。每周,您都会获得有关当地天气模式的更多信息,从而可以提高您的预测能力。这些信息可以以不同的方式使用。您可以在星期一预测星期六的天气,等到星期六,然后根据星期一的预测和星期六的实际天气之间的差异更新星期一的预测。时间差异方法通过在整个星期内更新其预测来做一些更巧妙的事情。在星期一对星期六的天气做出预测后,TD 可以让您将星期一的预测与星期二对星期六的预测进行比较,并相应地调整星期一的预测。例如,如果星期一对星期六的预测是 90% 的降雨概率,但星期二对星期六的预测只有 60% 的概率,那么时间差异方法就是降低具有类似指标的后续几周的星期一预测。4
To borrow an example from Sutton (1988, 10), suppose you are a weather forecaster in a monotonous climate, charged with making a prediction each week about the chance of rain on the coming Saturday. Each week, you gain more information about the local weather patterns, allowing you to refine your predictive powers. That information could be used in different ways. You could make a prediction on Monday about the weather on Saturday, wait until Saturday, and then update Monday’s prediction based on the difference between Monday’s prediction and Saturday’s actual weather. The temporal difference approach does something a little neater by updating its predictions throughout the week. Having made a prediction Monday about the weather on Saturday, TD lets you compare Monday’s prediction to Tuesday’s prediction about Saturday and adjust your monday predictions accordingly. For instance, if Monday’s prediction for Saturday is a 90% chance of rain, but Tuesday’s prediction for Saturday is only a 60% chance, then the temporal difference approach is to lower the Monday prediction for subsequent weeks with similar indicators.4
鉴于不同的问题设置带来不同的挑战,目前有无数不同的强化学习算法在使用。这些算法权衡了内存消耗、计算成本、数据效率和稳定性等因素;有些适用于非常小的环境,有些适用于非常大的环境;有些适用于离散动作空间,有些适用于连续动作空间。5因此,“强化学习”是指一般的学习问题和一套计算算法,以及致力于研究它们的计算机科学分支,而不是指问题的任何象征性解决方案。
Given that different problem settings present different challenges, there are myriad different RL algorithms in use today. These trade off factors such as memory consumption, computation cost, data efficiency, and stability; some are useful for very small environments, and others are useful for very large environments; some for discrete action spaces, and others for continuous ones.5 Thus, ‘reinforcement learning’ refers to a general learning problem and a suite of computational algorithms, as well as to the branch of computer science devoted to studying them, rather than to any token solution to the problem.
RLDM 版本的强化学习在上述基本强化学习框架的基础上增加了两个假设。首先,它假设了强化学习和我们这样的生物的思维之间存在某种关系。这一假设并非普遍适用:机器学习研究人员可以进行数十年的研究,但对强化学习在生物体中的作用却完全不了解。同样,认知和比较心理学家可以研究学习和行为的本质,而无需依赖强化学习框架。然而,RLDM 追随计算神经科学家和其他决策科学家的观点,他们怀疑强化学习确实捕捉到了我们这样的思维的一些特殊之处。正如 Dayan 和 Niv 所说,强化学习似乎提供了
The RLDM version of reinforcement learning adds two assumptions to the basic reinforcement learning framework sketched above. First, it assumes a relationship between reinforcement learning and the minds of biological creatures like us. This assumption is by no means universally held: researchers in machine learning can pursue decades of research and remain entirely agnostic regarding the role of reinforcement learning in biological agents. Similarly, cognitive and comparative psychologists can study the nature of learning and behavior without any appeals to the reinforcement learning framework. However, RLDM follows computational neuroscientists and other decision scientists who suspect that reinforcement learning does, in fact, capture something special about minds like ours. As Dayan and Niv put it, reinforcement learning appears to offer
不仅仅是一种用于情感决策的计算“近似理想学习者”理论。[强化学习]算法,例如时间差 (TD) 学习规则,似乎直接在神经机制中实例化,例如多巴胺神经元的相位活动。[强化学习]似乎如此透明地嵌入,这使得人们能够以更直接的方式使用它来对在大量范式和系统中收集的大量行为和神经数据进行假设以及回顾性和预测性解释。Dayan 和 Niv (2008, 1)
More than just a computational, ‘approximate ideal learner’ theory for affective decision-making. [Reinforcement learning] algorithms, such as the temporal difference (TD) learning rule, appear to be directly instantiated in neural mechanisms, such as the phasic activity of dopamine neurons. That [reinforcement learning] appears to be so transparently embedded has made it possible to use it in a much more immediate way to make hypotheses about, and retrodictive and predictive interpretations of, a wealth of behavioral and neural data collected in a huge range of paradigms and systems. Dayan and Niv (2008, 1)
值得注意的是,我们可以放宽强化学习直接在大脑运作中实例化的条件。可以说,强化学习为思考大脑中的决策和选择提供了非常有用的框架。
Notably, we are free to relax the condition that reinforcement learning is directly instantiated in the workings of the brain. It is sufficient to say that reinforcement learning provides remarkably useful frameworks for thinking about decision-making and selection in the mind.
RLDM 的第二个假设与奖励的主观性有关。如上所述,在基本的强化学习框架中,当代理进入环境的某些状态,或者代理在某些状态下采取某些行动时,奖励会从环境传递给代理。奖励的这种外部性质在机器学习的背景下没有问题,因为奖励只是由研究人员设计的一种传达研究人员希望人工智能实现的目标的方式。但在考虑生物有机体时,事情变得更加棘手,因为不清楚奖励来自哪里。这个关于生物学奖励起源的问题产生了 Juechems 和 Summerfield (2019) 所说的奖励悖论。作者认为,这个问题是自相矛盾的,因为,
RLDM’s second assumption has to do with the subjective nature of reward. As noted above, in the basic reinforcement learning framework, rewards are passed from the environment to the agent when an agent enters certain states of the environment, or when the agent takes certain actions in certain states. This external nature of reward is unproblematic in the context of machine learning because the reward is simply designed by the researcher as a means of communicating what the researcher wants the artificial agent to achieve. But things get thornier when considering biological organisms, since it’s not clear where rewards come from. This question regarding the origin of reward in biology generates what Juechems and Summerfield (2019) call the paradox of reward. The issue is paradoxical, the authors contend, because,
不存在任何外部实体能够直接量化每个行为的后果,例如在视频游戏中完成关卡或射击怪物所获得的积分。生物系统是否有专门的渠道来接收与传统感官不同的外部奖励,这一点也不明显。相反,奖励和惩罚是感官观察——苹果的味道、拥抱的温暖——因此刺激值必须由代理推断,而不是由世界赋予。换句话说,奖励必须是内在的,而不是外在的。”(2019,837-838)
No external entity exists that can directly quantify the consequences of each action, like the points that are awarded in a video game for completing levels or shooting monsters. Nor is it obvious that biological systems have a dedicated channel for receipt of external rewards that is distinct from the classical senses. Rather, rewards and punishments are sensory observations—the taste of an apple, the warmth of an embrace—and so stimulus value must be inferred by the agent, not conferred by the world. In other words, rewards must be intrinsic, not extrinsic.” (2019, 837-838)
这种感官观察与内在奖励分配之间的转换究竟是如何发生的(假设它确实发生了)仍然是激烈的理论争论的主题。一种可能的解释是,像我们这样的心智已经进化出了将感官观察转化为享乐信号的特定机制(例如,参见 Schultz,2015 年)。另一种补充的可能性是,除了进化出的基本奖励机制(例如食物和水)之外,人类还发展出了类似于稳态设定点的认知设定点,在这个设定点上,奖励相当于计算距离自我定义目标(例如结婚或上研究生院)的副产品(Juechems 和 Summerfield,2019 年)。在这里,RLDM 再次采用了一种极简方法,仅仅假设像我们这样的心智会将主观奖励分配给感官观察等,尽管是间接的;它暂时不知道这种分配是如何进行的。
Exactly how this conversion between sensory observations and assignments of intrinsic rewards occurs—assuming that it occurs at all—remains the subject of lively theoretical debate. One possible explanation is that minds like ours have evolved specific mechanisms that convert sensory observations into hedonic signals (e.g., see Schultz, 2015). Another, complementary possibility is that, in addition to the evolved mechanisms for basic rewards (e.g., food and water), human beings develop cognitive setpoints, akin to homeostatic setpoints, on which reward amounts to a by-product of computing the distance to self-defined goals (e.g., such as getting married or going to graduate school) (Juechems and Summerfield, 2019). Here, RLDM again takes a minimal approach, and merely assumes that minds like ours subpersonally assign subjective rewards to, e.g., sensory observations, albeit indirectly; it remains provisionally agnostic about how this assignment takes place.
让我们更深入地探讨第一个假设。RLDM 在什么意义上为认知神经科学证据提供了独特的解释视角?
Let’s explore the first assumption in more depth. In what sense does RLDM provide a distinctive, interpretive lens for cognitive neuroscientific evidence?
如上所述,可以说最重要的联系是 RLDM 与哺乳动物大脑中的奖励系统之间的联系。在 20 世纪 90 年代中期,理论和实证研究表明,时间差分学习算法可以准确近似多巴胺神经元的激发(有关这一发现的叙述,请参阅 (Montague, 2006; Redish, 2013; Colombo, 2014)。也就是说,当生物体在给定状态下经历高于或低于预期的值时,多巴胺神经元就会激发(Schultz 等人,1997 年)。这一发现为多巴胺神经元活动的所谓奖励预测误差假设奠定了基础,该假设认为“哺乳动物产生多巴胺的神经元的相位活动的功能之一是通过大脑向目标区域传递预期未来奖励的旧估计和新估计之间的误差信号”(Sutton 和 Barto,2018a,381)。
As gestured at above, arguably the most significant connection is between RLDM and the reward system in the mammalian brain. In the mid-1990s, theoretical and empirical work showed that the firing of dopamine neurons is accurately approximated by the temporal difference learning algorithm (for narrative accounts of the discovery, see (Montague, 2006; Redish, 2013; Colombo, 2014). That is, dopamine neurons fire when an organism experiences a higher- or lower-than-expected value in association with a given state (Schultz et al., 1997). This discovery provides the foundation for the so-called reward prediction error hypothesis of dopamine neuron activity, which holds that “one of the functions of the phasic activity of dopamine-producing neurons in mammals is to deliver an error signal between an old and a new estimate of expected future reward to target areas through the brain” (Sutton and Barto, 2018a, 381).
这一开创性的发现反过来又促使人们使用强化学习方法来研究视觉(Hayhoe and Ballard,2005 年;Hikosaka 等人,2006 年;Hickey 等人,2010 年)、注意力(Della Libera and Chelazzi,2009 年;Chelazzi 等人,2014 年;Anderson and Kim,2018 年)、记忆(Patil 等人,2017 年;Ergo 等人,2020 年)、前瞻性记忆(Krishnan and Shapiro,1999 年;Katai 等人,2003 年;Kliegel 等人,2005 年;Walter and Meier,2014 年)、认知控制(Savine and Braver,2010 年;Chiew and Braver,2014 年;Cubillo 等人,2019 年)的神经科学,以及最重要的决策(Sutton and巴托,2018a;达扬和尼夫,2008;兰格尔等人,2008;大研,2011; Glimcher 和 Fehr,2013)。
This seminal finding in turn led to the use of reinforcement learning methods to study the neuroscience of vision (Hayhoe and Ballard, 2005; Hikosaka et al., 2006; Hickey et al., 2010), attention (Della Libera and Chelazzi, 2009; Chelazzi et al., 2014; Anderson and Kim, 2018), memory (Patil et al., 2017; Ergo et al., 2020), prospective memory (Krishnan and Shapiro, 1999; Katai et al., 2003; Kliegel et al., 2005; Walter and Meier, 2014), cognitive control (Savine and Braver, 2010; Chiew and Braver, 2014; Cubillo et al., 2019), and above all, decision-making (Sutton and Barto, 2018a; Dayan and Niv, 2008; Rangel et al., 2008; Dayan, 2011; Glimcher and Fehr, 2013).
例如,现在有系统的证据表明,奖励系统指导视觉注视和扫视眼球运动,即我们何时看什么、以什么顺序看 (Liao and Anderson, 2020)。同样,奖励比位置或显着性更准确地指导我们做什么或不注意什么 (Anderson and Kim, 2018)。相反,奖励系统的缺陷和破坏(例如,由成瘾物质引起)不仅与帕金森氏症和图雷特氏症等疾病有关,还与一系列精神疾病有关,包括抑郁症 (Huys et al., 2015) 和成瘾 (Hyman, 2005; Redish et al., 2008; Redish, 2013)。可以说,强化学习的方法代表了一个重要的、迄今为止未充分利用的框架,用于阐明在一系列“低级”和“高级”认知处理中竞争事态之间选择的性质和机制。
For example, a systematic body of evidence now indicates that the reward system guides visual fixation and saccadic eye movement, i.e., what we look at, when, and in what order (Liao and Anderson, 2020). Similarly, reward guides what we do or don’t attend to more precisely than do either location or salience (Anderson and Kim, 2018). Conversely, deficits and disruptions (e.g., by addictive substances) to the reward system are not only implicated in diseases such as Parkinson’s and Tourette’s, but also in a range of psychiatric disorders, including depression (Huys et al., 2015) and addiction (Hyman, 2005; Redish et al., 2008; Redish, 2013). Arguably, methods from reinforcement learning thus represent an important and, to date, under-utilized framework for elucidating the nature and mechanisms underlying selection between competing states of affairs across a range of ‘low’- as well as ‘high-level’ kinds of cognitive processing.
当支持者说 RLDM 有其特殊之处时,他们往往会指出以下一个或两个考虑因素。首先,强化学习算法成功地预测和描述了奖励系统的运作方式;相比之下,其他方法,包括预测处理(参见本卷第 17 页 Clark),通常仅提供已知现象的回顾性解释。
When proponents say there’s something special about RLDM, then, they tend to point to one or both of the following considerations. First, reinforcement learning algorithms successfully predict and characterize the workings of the reward system; by contrast, other approaches, including predictive processing (see Clark, 17 of this volume), often provide merely retrodictive explanations of known phenomena.
其次,奖励系统似乎在从感知到经济选择的一系列认知能力中发挥着巨大作用。问题是,从哲学的角度来看,如何最好地描述奖励和价值在头脑中的作用?6
Second, the reward system appears to play an outsized role in a range of cognitive capacities, from sensation through to economic choice. The question is, what’s the best way of characterizing this role of reward and value in the mind, from a philosophical point of view?6
原则上,奖励的作用可以在多个解释层面和多个相互依赖的理论领域中进行描述,包括计算术语、细胞和系统神经科学术语、认知神经科学和神经经济学术语以及心理和行为术语(Hochstein,2016)。例如,如上所述,我们可以使用强化学习方法从计算角度捕捉奖励和价值的作用(有关概述,请参阅 Sutton 和 Barto (2018a),但也请参阅混合方法,例如 Gershman (2015) 中提出的方法)。或者,按照 Schultz 等人 (1997) 的发现,我们可以从细胞和系统神经科学的角度来描述奖励和价值,既可以从多巴胺能功能的角度,也可以从大脑奖励系统的更一般的系统级神经分析的角度来描述。在“更高”的层面上,我们可以从认知神经科学和神经经济学的角度来描述奖励和价值,借鉴行为实验和 fMRI 数据,并使用“决策”、“动机”和“支付意愿”等概念。等等。
In principle, the role of reward can be characterized at multiple levels of explanation and across multiple, co-dependent theoretical domains, including in computational terms, cellular- and systems- neuroscientific terms, cognitive neuroscientific and neuroeconomic terms, and psychological and behavioral terms (Hochstein, 2016). For instance, as discussed above, we can capture the role of reward and value in computational terms using methods from reinforcement learning (for an overview, see Sutton and Barto (2018a) though see also hybrid approaches, such as that put forward in Gershman (2015)). Or again, following the Schultz et al. (1997) discovery, we can characterize reward and value in cellular-and-system neuroscientific terms, both in terms of dopaminergic functioning as well as in terms of the more general, system-level neural analyses of the reward system in the brain. At a ‘higher’ level still, we can characterize reward and value in cognitive neuroscientific and neuroeconomic terms, drawing on behavioral experiments and fMRI data, and using constructs such as ‘decision-making,’ ‘motivation,’ and ‘willingness-to-pay.’ And so on.
接下来,我将大致从“概念”层面来描述奖励和价值在心灵中的作用,即心灵哲学中典型的粗粒度。因此,我的论点也在这个阶段得到了扩展,从 RLDM 的细节和相关的经验证据转向更传统的哲学描述——即描述一个我称之为评价的认知过程。这对于心灵哲学的未来工作至关重要,例如,使我们能够区分和理解评价与哲学民间心理欲望概念之间的关系(有关这种精神的工作,请参阅 Schroeder,2004;Arpaly 和 Schroeder,2014),或者再次使我们能够区分和理解评价与情感、情绪和情绪的各种概念之间的关系(有关情绪的哲学讨论,例如,请参阅 Scarantino 和 Sousa,2021)。
In what follows, I characterize the role of reward and value in the mind at roughly a ‘conceptual’ level of explanation, i.e., at a coarseness of grain typical in the philosophy of mind. Accordingly, my argument also broadens out at this stage, moving from the specifics of RLDM and associated empirical evidence to a more traditional, philosophical characterization—namely, to characterize a cognitive process I’ll call valuation. This is essential for future work in the philosophy of mind, e.g., to enable us to distinguish and understand the relationship between, say, valuation and the philosophical folk psychological notion of desire (for work in this spirit, see Schroeder, 2004; Arpaly and Schroeder, 2014), or again, to enable us distinguish and understand the relationship between, valuation and the various notions of affect, mood, and emotion (for a philosophical discussion of emotion see, e.g., Scarantino and Sousa, 2021).
这样,所产生的评价特征在某些情况下补充了、在某些情况下修正了用于描述和理解思想和像我们这样的思想的传统概念机制。
In this way, the resulting characterization of valuation in some cases complements and in some cases revises the traditional conceptual machinery used to describe and understand the mind and minds like ours.
回想一下上一节,在基础强化学习中,奖励是一些指定的数量,用于表示对每个状态(或在每个状态下采取每个行动)的内在可取性,并在代理到达该状态时传达给代理。此外,分配给每个状态(或在每个状态下采取每个行动)的内在可取性或奖励可以与价值概念形成对比,价值概念捕获与每个状态(或每个状态下的每个行动)相关的预期、折扣未来奖励总和,条件是特定的行动策略。因此,虽然早上喝咖啡对我来说本质上是有益的,但磨咖啡或喝牛奶却不是——但这些后一种状态仍然很有价值,因为在我的行动策略的条件下,它们会引导我喝咖啡。
Recall from the previous section that in basic reinforcement learning, reward is some quantity assigned to represent the intrinsic desirability to each state (or to taking each action in each state), and which is conveyed to an agent when they reach that state. Further, this intrinsic desirability assigned to each state (or taking each action in each state), or reward, can be contrasted with the notion of value, which captures the expected, discounted, sum of future reward associated with each state (or each action in each state), conditional on a certain policy of action. So, while coffee is intrinsically rewarding for me in the morning, grinding coffee or getting milk is not—but these latter states are nonetheless valuable to the degree that, conditional on my action policy, they lead me to my cup of coffee.
此外回想一下,根据 RLDM,奖励假设捕捉到了心智的一些特殊之处,即奖励系统在哺乳动物大脑中的重要作用,其中奖励系统本身与广泛的“低级”和“高级”认知能力有关。
Recall in addition that, according to RLDM, the reward hypothesis captures something special about the mind, namely, the substantial role of the reward system in the mammalian brain, where the reward system is itself implicated in a wide range of ‘low-’ and ‘high-level’ cognitive capacities.
我认为,如果这两种说法都是正确的,那么我们可以使用 RLDM 和相应的经验证据来修改我们对大脑在做什么、如何进行以及这种处理的目的是什么的哲学理解。
I argue that if both of these claims are right, then we can use RLDM and the corresponding empirical evidence to revise our philosophical understanding of what the mind is doing, how it is going about it, and what this kind of processing is for.
让我们先从“什么”开始。我认为,很简单,心智会进行评价。通俗地说,我认为这意味着心智不断地将奖励和价值归因于一系列感觉、知觉、行动等——本质上形成了一种对其经验特征的评价层。
Let’s start with the ‘what.’ Very simply, I argue, the mind engages in valuation. Informally, I take this to mean that the mind continually attributes reward and value to a range of sensations, perceptions, actions and so on—essentially forming a kind of evaluative layer over the features of its experience.
更专业地讲,我认为评价是指将与目标和情境相关的主观奖励和价值归因于内部和外部刺激的亚个人行为。评价是亚个人行为,因为它划定了因果机制而非有意机制(Dennett,1969;Drayson,2014)。这是关键:大脑会定期、机械地评估事态是好是坏。7此外,它与目标和情境相关,因为什么是有益的或有价值的取决于主体试图做什么,以及主体试图在何时何地做这件事。例如,如果我的目标是醒来并度过富有成效的一天,那么早上第一件事就是喝一杯咖啡是有价值的。但如果我的目标是休息并睡个好觉,那么深夜喝一杯咖啡就不是。它是主观的,因为什么是有益的和/或有价值的是主体相关的;虽然这位作者认为咖啡有益,但许多人并不这么认为。这里的“刺激”一词意在广泛地概括:奖励和价值可以归因于外部对象(商品)、状态、状态-行动对和行动政策,也可以归因于内部状态,例如经历、感觉和情绪。
In more technical terms, I argue that valuation refers to the subpersonal attribution of goal- and context-dependent subjective reward and value to internal and external stimuli. Valuation is subpersonal in the sense that it demarcates a causal rather than an intentional mechanism (Dennett, 1969; Drayson, 2014). This is key: the mind routinely, mechanistically assess states of affairs as better or worse.7 Further, it is goal- and context-dependent in the sense that what is rewarding or valuable depends on what the agent is trying to do, and when and where the agent is trying to do it. For example, if my goal is to wake up and have a productive day, then drinking a cup of coffee first thing in the morning is valuable. But if my goal is to rest and get a good night’s sleep, then drinking a cup of coffee late at night is not. It is subjective in the sense that what is considered rewarding and/or valuable is agent-relative; while this author finds coffee rewarding, many individuals do not. And the term stimuli here is intended as a broad catch-all: reward and value can be attributed to external objects (commodities), states, state-action pairs, and action policies, but also to internal states of affairs, such as experiences, feelings, and moods.
就“如何”而言,价值评估以多种互补方式实现。一种重要方式是通过将价值追溯归因于导致后续状态奖励的状态。回想一下上一节中走到附近门的任务。第一次到达门口并因此获得或体验到奖励时,就会发生一种次个人的、追溯性的价值归因于导致奖励的先前状态。也就是说,会发生一种次个人的、追溯性的价值归因于倒数第二个状态,这种归因源于与到达“最终”状态(即门)相关的奖励。这种追溯性归因反过来会继续反向反馈,即会发生次个人的、追溯性的价值归因于倒数第二个状态,依此类推。通过这种方式,正在进行的交互会继续修改归因于给定状态或状态-动作对的价值,根据需要升级或降级。例如,如果小鸻发现了新的虫群,通往海滩的某条路径的价值就会增加。但价值也可以“动态”计算(Balleine 和 Dickinson,1998 年;Langdon 等人,2018 年),相对于上下文特征(Hunter 和 Daw,2021 年),以及相对于想象或预期的未来状态(Gagne 和 Dayan,2022 年;Russek 等人,2021 年)。例如,如果新来伦敦的人从格林公园前往罗素广场,而霍尔本站正在翻修,那么乘坐蓝线的价值就会下降。
In terms of the ‘how,’ valuation is realized in a number of complementary ways. One important way is through the retroactive attribution of value to states that lead to reward in subsequent states. Recall the task of walking to a nearby door in the previous section. Upon arriving at the door for the first time and therefore receiving or experiencing the reward, there occurs a subpersonal, retroactive attribution of value to the antecedent states that then led to the reward. That is, there occurs a subpersonal, retroactive attribution of value to the penultimate state, derived from the reward associated with arriving at the ‘ultimate’ state, i.e., the door. This retroactive attribution in turn continues to feed backwards, i.e., there occurs the subpersonal, retroactive attribution to the antepenultimate state, and so on. In this way, ongoing interactions continue to revise the value attributed to a given state or state-action pair, upgrading or downgrading as needed. For instance, if the baby plover finds a new trove of bugs, the value of a certain path leading to the beach can increase. But values can also be computed ‘on the fly’ (Balleine and Dickinson, 1998; Langdon et al., 2018), relative to features of context (Hunter and Daw, 2021), and with respect to imagined or expected future states (Gagne and Dayan, 2022; Russek et al., 2021). For instance, if the newcomer to London is traveling from Green Park to Russell Square and Holborn Station is under renovation, the value of taking the blue line decreases.
这里的主要思想是,心智不断地评估和重新评估事态是好是坏,用相当形象的术语来说,构建和铸造一种对其状态和经历的评价结构。
Here, the main idea is that the mind continually assesses and reassesses states of affairs as better or worse, constructing and casting, to put things in fairly figurative terms, a kind of evaluative fabric over its states and experiences.
但最令人感兴趣的是估值的“目的”(正如这些事情往往会发生的那样)。
But it’s the ‘what for’ of valuation that is of most interest (as these things tend to go).
我认为,像我们这样的心智中,评价功能是为了解决所谓的选择问题,或者在一个或多个相互竞争的方案之间进行选择的问题。选择问题可以用一般术语来描述,因为心智必须不断地选择要计算什么、要感知什么、要感知什么、要关注什么、要选择什么(作为世界上的一种行为)等等。选择问题的技术特征示例包括在多个动作控制器之间进行选择(Daw 等人,2005 年)、感知决策问题(Gold 和 Shadlen,2007 年)以及基于行动的决策问题(Glimcher,2011 年)。至关重要的是,正如这些例子的范围所表明的那样,选择问题在心智中无处不在。它发生在心理处理的每个主要阶段,从感觉和计算到行动,也发生在心理处理描述的每个层次,从亚个人到个人。
The function of valuation in minds like ours, I argue, is to solve for what I call the selection problem, or the problem of selecting between one or more competing alternatives. The selection problem can be described in general terms, insofar as the mind must continually select what to compute, what to sense, what to perceive, what to attend to, what to choose (as an action in the world), and so on. Technically characterized examples of the selection problem include selecting between multiple action controllers (Daw et al., 2005), the problem of perceptual decision-making (Gold and Shadlen, 2007), and the problem of action-based decision-making (Glimcher, 2011). Crucially, as the span of these examples should illustrate, the selection problem occurs ubiquitously in the mind. It occurs at every major stage of mental processing, from sensation and computation to action, and at every level of description of mental processing, from the sub-personal to the personal.
我认为,RLDM 在心智设计方面的实验的一个核心的、未被充分重视的结果是,心智会根据奖励和价值的归因在可用的计算、感觉、感知等之间进行选择。
A central, underappreciated upshot of the RLDM’s experiment in mind design, I argue, is that the mind selects between available computations, sensations, perceptions and so on conditional on attributions of reward and value.
为了说明这一点,请考虑一下不太可能发生的双眼竞争现象。当一只眼睛看到一种刺激,而另一只眼睛看到另一种刺激时,就会发生双眼竞争。由此产生的体验是两幅图像来回交替;双眼竞争中的知觉优势是指两幅图像中的一幅首先出现,或者在整个交替体验持续时间内出现的时间更长。值得注意的是,奖励刺激和奖励知觉都会导致知觉优势;也就是说,参与者更有可能感知与奖励相关的刺激和知觉(Balcetis 等人,2012 年;Wilbertz 等人,2014 年;Marx 和 Einhäuser,2015 年;Haas,2021 年)。此外,对于受惩罚的知觉,还会出现一种互补现象:参与者对配对中未受惩罚的知觉具有知觉主导性,这表明奖励或惩罚不仅仅是贝叶斯类预测处理考虑的额外信息,正如预测处理观点所暗示的那样(Wilbertz 等人,2014 年)。这样,对知觉主导性知觉的选择直接取决于双眼竞争范式中奖励和价值的归因,即评价。参与者倾向于感知最有回报或最有价值的刺激或知觉。因此,当涉及到选择“感知什么”的认知任务时,评价起着驱动作用。
To illustrate, consider the unlikely phenomenon of binocular rivalry. Binocular rivalry occurs when one stimulus is shown to one eye at the same time as a different stimulus is shown to the other. The resulting experience is of the two images alternating back and forth; perceptual dominance in binocular rivalry refers to one of the two images appearing first, or for a longer period of time during the overall duration of the experience of alternation. Notably, both rewarded stimuli and rewarded percepts result in perceptual dominance; that is, participants are more likely to perceive stimuli and percepts associated with a reward (Balcetis et al., 2012; Wilbertz et al., 2014; Marx and Einhäuser, 2015; Haas, 2021). Moreover, a complementary phenomenon occurs for punished percepts: participants experience perceptual dominance for the non-punished percept in the pair, suggesting that the reward or punishment is not simply additional information taken into consideration by Bayes-like predictive processing, as a predictive processing view might suggest (Wilbertz et al., 2014). In this way, the selection of the perceptually dominant percept is directly conditional on the attribution of reward and value in the binocular rivalry paradigm, i.e., on valuation. Participants tend to perceive the most rewarded or valuable stimulus or percept. Hence, when it comes to the cognitive task of selecting ‘what to perceive,’ valuation plays a driving role.
但评价不仅仅在感知中发挥驱动作用。相反,当我说心智本质上是评价性的,我的意思是,我们感知、感知和关注环境特征,这些特征取决于我们对奖励和价值归因的分布,就像我们关注奖励而不是突出或基于位置的感知一样(Anderson 和 Kim,2018)。我们记住,记住要记住(前瞻性地记住)以奖励为条件(有关有用的评论,请参阅 Walter 和 Meier,2014)。我们根据奖励和价值归因的分布分配我们的认知资源(在认知控制中),如认知控制的预期价值所示(Musslick 等人,2015)。我们决定、选择和计划我们未来的行动,这些条件取决于我们对奖励和价值归因的分布,就像之前的奖励经验决定了参与者在日常经济交易中的支付意愿一样(Plassmann 等人,2007)。
But valuation doesn’t just play a driving role in perception. Rather, when I say that the mind is fundamentally evaluative in nature, I mean that we sense, perceive, and attend to the features of our environment conditional on our distributions of reward and value attribution, as when we attend to rewarded rather than salient or location-based percepts (Anderson and Kim, 2018). We remember, remember to remember (remember prospectively) conditional on reward (for a useful review, see Walter and Meier, 2014). We allocate our cognitive resources (in cognitive control) conditional on our distributions of reward and value attributions, as shown by the expected value of control account of cognitive control (Musslick et al., 2015). And we decide, choose, and plan our future actions conditional on our distributions of reward and value attributions, as when prior reward experience determines a participant’s willingness-to-pay in everyday economic transactions (Plassmann et al., 2007).
相反,当奖赏系统受损时,例如,由于基底神经节细胞死亡(帕金森症)或由于稳态转变(物质成瘾),选择就会出现直接的相应缺陷:例如,帕金森症中的运动震颤、情绪障碍和执行功能障碍,以及物质成瘾中的渴求、控制力受损和尽管有极其负面的后果仍继续使用(有关详细讨论,请参阅 Redish,2004;Redish 等,2008)。等等。
Conversely, when the reward system is impaired, for example, through cell death in the basal ganglia (Parkinson’s) or due to allostatic shift (substance addiction), there are direct, corresponding deficits in selection: e.g., in motor tremors, mood disorders, and executive dysfunction in Parkinson’s disease, and e.g., in cravings, impaired control, and continued use in spite of overwhelmingly negative consequences in substance addiction (for extended discussions, see Redish, 2004; Redish et al., 2008). And so on.
需要强调的是,我并不认为选择与评价是同义词。但选择取决于评价:在一生的反复过程中,我们会选择或避免我们所学到的更好或更坏的东西。8此外,评价在一系列心理选择问题中被部署和重新部署,包括感觉、知觉、注意力和认知方面的选择。9
To emphasize, I do not argue that selection is synonymous with valuation. But selection is conditional on valuation: we select or avoid what we learn is better or worse over a life-long course of iteration.8 Moreover, valuation is deployed and redeployed across a range of selection problems in the mind, including selection in sensation, perception, attention, and cognition generally.9
因此,上面我提到奖励系统“影响”或“牵涉”一系列认知处理,现在我可以更具体地说:评价指导着在我们这样的人的头脑中发生的一系列心理处理中的选择。
Hence, where I suggested above that the reward system “influences” or is “implicated in” a range of cognitive processing, I can now be much more specific: valuation guides selection across the range of mental processing that occurs in minds like ours.
那么,评价论题,即心智从本质上来说具有评价性的观点,又该如何理解呢?
What, then, of the evaluative thesis—the view that the mind is fundamentally evaluative in nature?
一开始,我提出,根据这种观点的弱化版本,心智包含思考和评价。也就是说,根据弱化论点,心智会做一些事情,比如在双眼竞争中“看到”两个相互竞争的刺激,但每次只“感知”其中一个刺激,从而产生感知交替的标志性感知体验。在标准情况下,我们还可以说,一个人可以继续利用这种感知来形成信念、得出推论,并执行通常与 Haugeland 所说的“思考、智力”(Haugeland,本卷第 2 章)或其他人所说的“智力”相关的所有其他类型的认知任务。
At the outset, I suggested that on the weaker version of the view, the mind encompasses both thinking and evaluation. That is, according to the weaker thesis, the mind does something like ‘see’ two competing stimuli in binocular rivalry, and ‘perceive’ only one of those stimuli at a time, resulting in the signature perceptual experience of perceptual alternation. In a standard case, we might also say than an individual could go on to draw on this perception to form beliefs, draw inferences, and perform all the other kinds of cognitive tasks that are typically associated with, as Haugeland put it, “thinking, intellect,” (Haugeland, Chapter 2, this volume), or as others put it, “intelligence.”
但是,根据评价论点的较弱版本,心智还会做一些其他的事情,如果没有这些事情,心智就不会是现在的样子,即,它会根据某些目标和环境的某些方面,以上述方式,即从个人角度,通过各种形式的归因,在二元关系中,等等,不断地评估事物是好是坏。
But, on the weaker version of the evaluative thesis, the mind also does something else, without which it would not be the mind it is—namely, it continually assesses things as better or worse, conditional on certain goals and aspects of the environment, in the ways described above, i.e., subpersonally, through various forms of attribution, in a two-place relation, and so on.
从这个意义上来说,较弱的论点并不完全试图推翻传统的思维心智概念,而是通过描述迄今为止相对被忽视的基本认知过程来对其进行补充。
In this sense, the weaker thesis doesn’t exactly try to unseat the traditional conception of thinking mind but rather complements it by describing a fundamental cognitive process that has heretofore been relatively overlooked.
我从三个方面捍卫较弱的论点。
I defend the weaker thesis on three grounds.
首先,证据证实了这一观点的积极特征。对成熟的教科书神经科学的调查表明,奖励系统确实与基本的生物物理过程有关,例如进食、饮水和生殖;与基本的认知过程有关,例如工作记忆、执行功能和时间估计;以及至关重要的是,与所有学习行为有关,从基于学习的感觉处理到计划、战略制定和二阶偏好形成(有关简要评论,请参阅 Arias-Carrión 等人,2010 年;有关详细讨论,请参阅 Glimcher 和 Fehr,2013 年)。同样,奖励系统也与哲学家们经常感兴趣的各种“复杂”的认知过程有关,包括情绪反应、社会偏好形成、言语和语言处理(参见特别是 Simonyan 等人(2012 年);以及(Ripollés 等人,2014 年;McNamara 和 Durso,2018 年))和概括。
First, evidence bears out the positive features of the view. A survey of mature, textbook neuroscience suggests that the reward system is indeed implicated in basic biophysical processes such as eating, drinking, and reproduction; in basic cognitive processes such as working memory, executive functioning and time estimation; and, crucially, in all learned behaviors, ranging from learning-based sensory processing through planning, strategizing, and second-order preference-formation (for a concise review, see Arias-Carrión et al., 2010; for extended discussions, see Glimcher and Fehr, 2013). Equally, the reward system is implicated in the kinds of ‘sophisticated’ cognitive processes that are often of interest to philosophers, including in emotional responding, social preference formation, speech and language processing (see especially Simonyan et al. (2012); and also (Ripollés et al., 2014; McNamara and Durso, 2018)), and generalization.
其次,较弱论点的预测比竞争理论解释(例如,预测处理空间中的解释或强调情绪在我们的认知过程中的作用的解释)的预测得到了更好的支持。回到双眼竞争的例子,为前一种比较提供了一个很好的例子。较弱的论点预测奖励(和负面奖励,即惩罚)应该影响双眼竞争中的知觉优势;预测处理解释没有做出这样的预测,事实上很难解释这种事后发现。但如上所述,奖励调节双眼竞争中的知觉优势(Haas,2021 年)。
Second, predictions made by the weaker thesis are better supported than predictions made by competing theoretical accounts, e.g., by accounts in the predictive processing space or accounts emphasizing the role of emotions in our cognitive processes. Returning to the example of binocular rivalry offers a good example of the former comparison. The weaker thesis predicts that rewards (and negative rewards, i.e., punishments) should influence perceptual dominance in binocular rivalry; predictive processing accounts make no such prediction, and in fact struggle to explain this type of finding post hoc. But as noted above, reward modulates perceptual dominance in binocular rivalry (Haas, 2021).
后一种比较的例子可能涉及对精神病态的不同解释。较弱的论点认为精神病态是一种价值评估障碍,可能涉及无法预测负面结果,和/或无法在负面经历后进行适当更新(例如,参见 Oba 等人,2021 年)。相比之下,根据强调情绪的精神病态的说法,具有精神病态特征的个体从根本上患有同理心障碍,或无法对情绪刺激做出适当反应的能力(Hare,1998 年;Soderstrom,2003 年;Blair,2007 年;Brook 和 Kosson,2013 年;Domes 等人,2013 年;Blair,2018 年)。因此,前者而非后者的解释预测具有精神病态特征的个体将表现出基本的经济决策缺陷。这里,一些证据似乎证实了较弱的论点:控制其他缺陷后,精神病患者在爱荷华赌博任务(Mahmut 等人,2008 年)以及其他类型的风险决策(Takahashi 等人,2014 年)中的表现似乎明显更差。
An example of the latter type of comparison might involve competing explanations of psychopathy. The weaker thesis proposes that psychopathy is a disorder of valuation, perhaps involving an inability to predict negative outcomes, and/or an inability to update appropriately following negative experiences (e.g., see Oba et al., 2021). By contrast, on an account of psychopathy emphasizing emotions, individuals with psychopathic traits fundamentally suffer from a disorder of empathy, or the ability to respond appropriately to emotional stimuli (Hare, 1998; Soderstrom, 2003; Blair, 2007; Brook and Kosson, 2013; Domes et al., 2013; Blair, 2018). Accordingly, the former but not the latter account predicts that individuals with psychopathic traits will exhibit deficits in basic economic decision-making. Here, some evidence seems to bear out the weaker thesis: controlling for other deficits, psychopaths appear to perform significantly worse on the Iowa Gambling Task (Mahmut et al., 2008), as well as on other types of risky decision-making (Takahashi et al., 2014).
第三,奖励系统的缺陷证实了这一观点。在这里,标准案例再次出现在计算和认知神经科学文献中,包括前面提到的帕金森病和图雷特氏病,以及重度抑郁症和不同类别的物质成瘾等疾病。以前瞻性记忆或“记住要记住”的能力为例。我在上面提到过,就像我们的许多认知能力一样,前瞻性记忆取决于评价;当某事与奖励相关时,我们更有可能在未来“记住要记住”。例如,与不与金钱奖励相关的任务相比,参与者在与金钱奖励相关的任务中表现出更高的前瞻性记忆表现(Krishnan 和 Shapiro,1999)。进一步说,与较弱的论点一致,我们预计帕金森病患者在前瞻性记忆任务中会出现缺陷。推理如下:前瞻记忆取决于评价,而评价由大脑的奖励系统实现,而帕金森病患者的奖励系统受到损害。因此,我们应该预料到帕金森病患者在前瞻记忆任务中会出现缺陷。
Third, deficits in the reward system corroborate the view. Here, standard cases again emerge in the computational and cognitive neuroscientific literature, including regarding the aforementioned Parkinson’s and Tourette’s diseases, as well as diseases such as Major Depressive Disorder and different categories of substance addiction. Take the case of prospective memory, or the ability to ‘remember to remember.’ I suggested above that, like so many of our cognitive capacities, prospective memory is conditional on valuation; we are more likely to ‘remember to remember’ something in the future when it’s associated with a reward. For example, participants show higher prospective memory performance for tasks that were associated with a monetary reward as compared to those that were not (Krishnan and Shapiro, 1999). By extension, consistent with the weaker thesis, we would expect to see deficits on prospective memory tasks among individuals with Parkinson’s disease. The reasoning goes like this: prospective memory is conditional on valuation, valuation by realized in the reward system in the brain, and the reward system is compromised in Parkinson’s disease. Hence, we should expect deficits on prospective memory tasks among individuals with Parkinson’s.
而这也确实是我们的发现。帕金森症患者在前瞻性记忆的几个核心阶段都表现出受损,最明显的是意图形成和意图启动阶段(Katai 等人,2003 年;Kliegel 等人,2005 年、2011 年;Pirogovsky 等人,2012 年;Ramanan 和 Kumar,2013 年;D'Iorio 等人,2019 年;Coundouris 等人,2020 年;但参见 Zabberoni 等人,2017 年;Kinsella 等人,2018 年)。类似的论证认为,重度抑郁症患者的奖励评估受损,即对奖励的低估、降级或未能更新(Takamura 等人,2017 年;Rupprechter 等人,2018 年、2021 年),可能解释了为什么这一人群在前瞻性记忆任务中也表现出系统性缺陷(Altgassen 等人,2009 年;Chen 等人,2013 年;Li 等人,2013 年;McFarland 和 Vasterling,2018 年)。等等。因此,第三种论证的基本结构是确定受评估调节的认知能力;确定一种上调或下调评估的疾病(通过奖励系统),然后确定是否如较弱的论点所预测的那样,患有相关疾病的个体也表现出相应认知能力的缺陷。
And this is indeed what we find. Individuals with Parkinson’s exhibit impairment in several core stages of prospective memory, most notably when it comes to the phases of intention formation and intention initiation (Katai et al., 2003; Kliegel et al., 2005, 2011; Pirogovsky et al., 2012; Ramanan and Kumar, 2013; D’Iorio et al., 2019; Coundouris et al., 2020; though see Zabberoni et al., 2017; Kinsella et al., 2018). Analogous arguments propose that impaired reward valuation, i.e., the dysfunctional underestimation, downgrading, or failure to update regarding rewards in individual with Major Depressive Disorder (Takamura et al., 2017; Rupprechter et al., 2018, 2021), may explain why this demographic also exhibit systematic deficits in prospective memory tasks (Altgassen et al., 2009; Chen et al., 2013; Li et al., 2013; McFarland and Vasterling, 2018). And so on. The basic structure of this third kind of argument, then, is to identify a cognitive capacity modulated by valuation; identify a disease that either upregulates or downregulates valuation (via the reward system) and then determine whether, as predicted by the weaker thesis, individuals with the relevant disorder also exhibit deficits on the corresponding cognitive capacity.
这三组理由中的每一组都通过给出一个确认的例子,为较弱的论点提供了归纳支持。说“评价在头脑中无处不在”就好比说“很多很多的天鹅都是白色的”。这意味着较弱的论点可以被推翻——即通过发现大量的例子,其中认知选择显然至少部分不是由评价过程支持的。但事实上,这正是我捍卫较弱论点的原因。源自 RLDM 的规范原则,加上决策科学的证据,使我们能够对头脑中的某个过程做出原则性的但本质上是经验性的主张——而这一主张已经为理解大脑的运作带来了重要的高层次含义。
Each of the three sets of reasons gives inductive support for the weaker thesis, by giving a confirming instance of it. Saying ‘valuation is ubiquitous in the mind’ is akin to saying ‘lots and lots of swans are white.’ This means that the weaker thesis can be disconfirmed—namely, by uncovering a meaningful number of instances where cognitive selection is clearly not, at least in part, underwritten by valuational processes. But this is in fact precisely why I defend the weaker thesis. The normative principles originating in RLDM, together with evidence from the decision sciences, enable us to make a principled but nonetheless fundamentally empirical claim about a certain process in the mind—where this claim already brings with it significant high-level implications for understanding the workings of the mind.
相比之下,在我看来,相同的原则和证据很难证明某些概念上更强有力的东西,包括关于评价在思想中的作用的普遍主张,我将在下文中讨论这一点。
By contrast, these same principles and evidence, to my mind, will struggle to bear out something conceptually stronger, including a universal claim regarding the role of valuation in the mind, which I discuss next.
较弱的论点认为评价在心灵中经验上无处不在,而较强的论点则提出心灵在本质上是评价性的。
Whereas the weaker thesis holds that valuation is empirically ubiquitous in the mind, the stronger thesis proposes that the mind is at bottom evaluative in nature.
有几种理解强论点的方法。可以将其理解为评价作为选择指导着心灵中的所有认知选择。根据这种理解,评价相当于探索心灵本质的万能理论。这是弱论点的强有力、普遍的版本。并且可以将其理解为评价在本体论上先于理解心灵的感知、认知和运动过程,因此在概念上是理解心灵的必要条件。我们可以将前者称为范围承诺,将后者称为优先承诺。
There are a couple of ways of understanding the stronger thesis. It can be understood as the claim that valuation as selection guides all cognitive selection in the mind. On this understanding, valuation amounts to grand unifying theory for exploring the nature of the mind. This is the stronger, universal version of the weaker thesis. And it can be understood as the claim that valuation is ontologically prior to and thus conceptually necessary for understanding the mind’s perceptual, cognitive, and motor processes. We can call the former claim the scope commitment and the latter the priority commitment.
从表面上看,人们可能会认为,评价论题和 RLDM 估值模型的支持者会直接认同其中一项或两项承诺。正如我们将看到的,它们可能比较弱的论题具有一些理论优势。它们名义上也更符合著名的“奖励就足够”假设(Silver 等人,2021 年)。尽管如此,我都不认同其中任何一项。
Prima facie, one might assume that a proponent of the evaluative thesis and RLDM model of valuation would by extension directly subscribe to one or both of these commitments. As we will see, they may hold some theoretical advantages over the weaker thesis. They are also nominally more in line with the prominent ‘Reward is Enough’ hypothesis (Silver et al., 2021). Nonetheless, I don’t commit to either.
那么,为什么不全力以赴地捍卫更强的评价观点呢?让我们从范围承诺开始。
So, why not go whole hog and defend the stronger version of the evaluative view? Let’s start with the scope commitment.
范围承诺是弱命题的强化版。弱命题认为评价在头脑中无处不在,而范围承诺认为评价是所有认知能力的核心。因此,弱命题认为“很多天鹅都是白色的”,而范围承诺则认为“所有天鹅都是白色的”,就是这样。
The scope commitment is a supercharged version of the weaker thesis. Whereas the weaker thesis holds that valuation is ubiquitous in the mind, the scope commitment holds that valuation lies at the heart of all cognitive capacities. Hence, where the weaker thesis suggests that ‘lots of swans are white,’ the scope commitment rounds up to claim that ‘all swans are white,’ period.
如此表述后,范围承诺的核心挑战很快就会变得显而易见:范围承诺需要捍卫一个普遍主张,并且无论多少证据都无法让我们做到这一点,因为总有未经检验的反例的可能性。10范围承诺太容易被证伪了。
So formulated, the central challenge with the scope commitment should quickly become obvious: the scope commitment requires defending a universal claim, and no amount of evidence will get us there, as there’s always the possibility of an untested counterexample somewhere.10 The scope commitment is just too easily falsified.
此外,我根本不认为价值评估可以保证大脑中所有有趣的东西。进化的大脑是一个混乱的产物,至少我们可以期待“拱肩”能力在任何有趣的意义上都不依赖于价值评估。我可以从较弱的论点中获得足够的好处,而无需将其扩展到逻辑极限。
Moreover, it simply doesn’t strike me as likely that valuation underwrites everything of interest in the mind. The evolved mind is a messy artefact, and at a bare minimum, we can expect ‘spandrel’ capacities that don’t rely on valuation in any interesting sense. I can get plenty of mileage out of the weaker thesis without needing to extend it to the logical limit.
这给我们留下了优先承诺。优先承诺更难处理。优先承诺对心灵提出了本体论主张:我们的“思考”过程取决于我们的评价过程。也就是说,我们拥有记忆、信念等等,这些都是根据我们对好坏的评估而产生的。请注意,这类似于认知科学中的行动优先理论;有关评论,请参阅(Briscoe and Grush,2020)。
This leaves us with the priority commitment. The priority commitment is trickier to deal with. The priority commitment makes an ontological claim about the mind: our ‘thinking’ processes are conditional on our evaluative processes. That is, we have the memories, beliefs and so on that we do in virtue of our assessments of better or worse. Note that this is analogous to action-first theories in cognitive science; for a review, see (Briscoe and Grush, 2020).
举一个这种理论的具体例子,人们可能会认为情景记忆的规范功能不是对过去事件进行“按实际发生的方式”进行编码,而是根据代理人将来可能记住的内容(并进而做)对过去事件的有用性进行编码。
To take a concrete example of this kind of theorizing, one might argue that the normative function of episodic memory is not to encode a past event ‘as it actually happened,’ but rather to encode a past event in light of what it might be useful for an agent to remember—and by extension, do—in the future.
采用优先级承诺使我们能够对各种认知能力的运作做出自上而下而非归纳的预测。例如,继续以情景记忆为例,采用范围承诺可以帮助我们预测什么会被记住,什么不会被记住,或者为什么个人会经历闪光灯记忆(如果他们确实会经历的话)。在优先级承诺方面,闪光灯记忆可能包含如此令人印象深刻的细节水平,因为在创伤事件之后,不清楚先前事件的哪些特征与未来行动最相关,以至于“所有”特征都会被延续到未来的学习中。这种解释将闪光灯记忆与强化学习中更一般的信用分配问题或确定哪些行为导致或导致了给定结果的问题紧密联系起来(Minsky,1961;Sutton and Barto,2018a)。
Adopting the priority commitment enables us to make top down rather than inductive predictions regarding the workings of various cognitive capacities. For instance, to continue with the case of episodic memory, adopting the scope commitment can help us make predictions about what will and won’t be remembered, or why individuals experience flashbulb memories (if indeed they do). On the priority commitment, flashbulb memories may contain such an impressive level of detail because, following a traumatic event, it is not clear which features of the preceding event are most relevant to future action, such that ‘all’ of them are carried forward for future learning. This interpretation draws a close connection between flashbulb memories and the more general credit assignment problem in reinforcement learning, or the problem of determining which actions lead or led to a given outcome (Minsky, 1961; Sutton and Barto, 2018a).
这种假设生成方式当然很有吸引力。捍卫估值优先性也颇具诱惑力,以此来抵消哲学和认知科学文献中对计算(和预测处理!)的标准强调。不过,出于两个原因,我没有这样做。
This kind of hypothesis generation is certainly appealing. It’s also pretty tempting to defend the priority of valuation as a way of counteracting the standard emphases placed on computation (and predictive processing!) in the philosophical and cognitive scientific literatures. Still, I stop short of doing so, for two reasons.
首先,范围承诺太容易被证伪,而优先级承诺则相反,不可证伪。如果我可以用回报最大化来描述任何感兴趣的认知或行为现象,那么检验假设就变得更加困难。
First, where the scope commitment is too easily falsified, the priority commitment is, conversely, unfalsifiable. If I can describe any cognitive or behavioral phenomenon of interest in terms of the maximization of reward, it becomes more difficult to test the hypothesis.
其次,“大一统”心智理论鼓励我们将大量经验证据重新塑造成一个单一的解释框架。然而,由此产生的解释有时并不那么具有启发性。此外,如果关于心智的一切最终都是“想象力”、“注意力”或“预测误差最小化”,那么肯定会失去一些解释的丰富性。在某些情况下,这类理论甚至有可能忽视与其理论承诺相悖的证据(Haas,2021 年)。
Second, ‘grand unifying’ theories of mind encourage us to recast broad swathes of empirical evidence into a single explanatory framework. However, the resulting explanations are sometimes less than illuminating. Moreover, surely some explanatory richness is lost if everything about the mind is ultimately, say, ‘imagination,’ ‘attention,’ or ‘prediction-error minimization.’ In some cases, these kinds of theories even run the risk of discounting evidence that is at odds with their theoretical commitments (Haas, 2021).
没有理由期望优先承诺能够避免这样的命运。为了尽量保持细致和可证伪的观点,我坚持较弱的论点。
There’s no reason to expect that the priority commitment would avoid such a fate. To try and keep to a fine-grained and falsifiable view, I thus stick with the weaker thesis.
最后,让我将本文探讨的评价论题与著名且颇具争议的“奖励就足够”(RIE)假设(Silver 等人,2021 年)进行几点比较。RIE 认为,奖励最大化足以“驱动表现出自然和人工智能中研究的大多数(如果不是全部)能力的行为”(Silver 等人,2021 年,1,重点是我加上的)。在这里,奖励是按照第 16.2 节中介绍的基本强化学习框架提出的含义来理解的。
Finally, let me draw out a few points of comparison between the evaluative thesis explored in this paper and the prominent and somewhat controversial ‘reward is enough’ (RIE) hypothesis (Silver et al., 2021). RIE holds that reward maximization is enough to “drive behavior that exhibits most if not all abilities that are studied in natural and artificial intelligence” (Silver et al., 2021, 1, added emphasis mine). Here, reward is understood in the sense put forward by the basic reinforcement learning framework introduced in Section 16.2.
与更强有力的论点一样,RIE 涉及几个不同的主张。首先,RIE 提出了认识论主张,即奖励最大化足以理解许多(如果不是全部)智能特征。这一主张的隐含意义是,奖励最大化比其他竞争科学理论提供了更好、更丰富的解释。其次,RIE 提出了本体论主张,即智能过程只是奖励最大化过程,其中“智能及其相关能力可以理解为通过在其环境中行动的代理来服务于奖励最大化”(Silver 等人,2021,5)。第三,RIE 提出了因果主张,即奖励最大化足以驱动我们与行为相关的各种能力,例如收集坚果或下围棋。根据最后这一主张,智能形式“隐含地出现”通过奖励最大化过程并作为其直接结果。进一步说,作者认为,“一个好的奖励最大化代理,在实现其目标的过程中,可以隐性地产生与自然和人工智能中考虑过的与智能相关的所有能力”(Silver 等人,2021,5)。
Like the stronger thesis, RIE involves a couple of different claims. First, RIE makes the epistemological claim that reward maximization is enough to understand many—if not all—features of intelligence. Implicit in this claim is that reward maximization provides better and richer explanations than other rival scientific theories do. Second, RIE makes the ontological claim that intelligent processes just are reward maximization processes, where “intelligence, and its associated abilities, can be understood as subserving the maximization of reward by an agent acting in its environment” (Silver et al., 2021, 5). And third, RIE makes that causal claim that reward maximization is sufficient to drive the kinds of abilities we associate with behavior, such as gathering nuts or playing Go. According to this last claim, the forms of intelligence “implicitly emerge” through and as a direct result of the process of reward maximization. By extension, the authors contend, “a good reward-maximizing agent, in the service of achieving its goal, could implicitly yield all the abilities associated with intelligence that have been considered in natural and artificial intelligence” (Silver et al., 2021, 5).
评价论题和 RIE 之间有什么关系?至少从表面上看,强论题的优先承诺和 RIE 的认识论主张似乎是一致的:奖励的作用提供了一种统一且有价值的方法来理解思想和智力的本质。
What is the relationship between the evaluative thesis and RIE? At least on their face, the stronger thesis’s priority commitment and RIE’s epistemological claim appear consistent: the role of reward provides a unified and valuable way of understanding the mind and the nature of intelligence.
但评价论题和 REI 在本体论和因果论方面存在分歧。归根结底,即使是更强有力的论题也相当于对大脑中认知过程的功能和范围的一对主张。相比之下,RIE 认为,所有智力处理都是奖励最大化的表达或副产品,从根本上讲,对奖励的追求推动了所有其他类型智力的出现。这些开始看起来像是两种非常不同的论点。
But the evaluative thesis and REI come apart on the ontological and causal fronts. At the end of the day, even the stronger thesis amounts to a pair of claims about the function and scope of a cognitive process in the mind. By contrast, RIE suggests that all intelligence processing is an expression or by product of reward maximization where, at bottom, the pursuit of reward drives the emergence of all other kinds of intelligence. These start to look like two very different kinds of arguments.
话虽如此,RIE 的一个软化特征是,它对奖励最大化在人工智能体中产生各种形式的智能方面的作用做出了务实的押注。也就是说,RIE 的作者提出,纯粹的强化学习框架足以实现通用人工智能,而无需手工制作或预训练。作者承认,
This being said, one softening feature of RIE is that it makes a pragmatic bet regarding the role of reward maximization in generating diverse forms of intelligence in artificial agents. That is, the authors of RIE propose that pure reinforcement learning frameworks will be sufficient to arrive at artificial general intelligence, without the need for handcrafting or pre-training. The authors acknowledge,
我们不提供任何关于强化学习代理的样本效率的理论保证。事实上,能力出现的速度和程度将取决于特定的环境、学习算法和归纳偏差;此外,人们可能会构建学习会失败的人工环境。相反,我们推测,通过交互来学习最大化奖励的解决方案策略将“足以”让智能及其相关能力在实践中出现。(Silver 等人,2021,10)
We do not offer any theoretical guarantee on the sample efficiency of the reinforcement learning agent. Indeed, the rate at and degree to which abilities emerge will depend upon the specific environment, learning algorithm, and inductive biases; furthermore one may construct artificial environments in which learning will fail. Instead, we conjecture that the solution strategy of learning to maximize reward via interaction will be ‘enough’ for intelligence, and its associated abilities, to emerge in practice. (Silver et al., 2021, 10)
从这个意义上说,通过采用一种制造者的方法(Craver,2021),RIE 至少可以通过利用奖励最大化来设计人工智能的努力间接地被证伪。11
In this sense, by adopting a kind of maker’s approach (Craver, 2021), RIE is at least indirectly falsifiable through efforts to leverage reward maximization to design artificial intelligence.11
在本章的开头,我提出 RLDM 是心智设计的一个非常成功的实例,以至于我们还没有完全弄清楚该如何处理它。我进一步指出,鉴于这一成功,我们应该超越 Haugeland 最初所说的将心智描述为完全由“思考、智力”构成的范畴,并开始认识到其从根本上评价的性质。与此同时,我试图将我的观点(一些哲学家可能认为非常有力)与更强烈的观点区分开来,这些观点更符合机器学习和强化学习文献中某些人持有的观点。
At the outset of this chapter, I proposed that RLDM is an instance of mind design so successful that we have not quite figured out what to do with it yet. I further argued that, in light of this success, we should move beyond characterizing the mind as exhaustively constituted by “thinking, intellect,” as Haugeland originally put it, and begin to recognize its fundamentally evaluative nature. At the same time, I’ve sought to distinguish my view, which some philosophers may take to be remarkably strong, from even stronger views, which are more in line with views held by some in the machine learning and reinforcement learning literatures.
最后,我想简要谈谈 Haugeland 所说的对人工智能的普遍抱怨,即人工智能无法或可能永远无法实现日常生活的丰富内在性,包括“感觉、情绪、自我、想象、情绪、意识——内心生活的整个‘现象学’。无论机器变得多么智能,家里仍然‘无人居住’”(本书第 33 页)。Haugeland 的描述让人想起了传统的二分法思维概念:即从“思考”和“其他一切”的角度来理解思维——即使“其他一切”包括许多重要的过程。
By way of conclusion, I want to briefly address what Haugeland called the common complaint about artificial intelligence, namely, that it cannot or may never achieve the rich interiority of everyday life, including “feelings, emotions, ego, imagination, moods, consciousness—the whole ‘phenomenology’ of an inner life. No matter how smart the machines become, there’s still ‘nobody home’” (this volume, p. 33). Haugeland’s characterisation is reminiscent of the traditional dichotomized conception of the mind: namely, of understanding the mind in terms of ‘thinking’ and, well, ‘everything else’—even if the ‘everything else’ includes a lot of the important processes.
评价的概念——规范丰富、经验充分——使我们能够对这种传统的二分法观点施加压力。至少,它挑战了这样一种观点,即我们可以在科学良知的指导下继续将情绪、意识和自我等截然不同的现象归类为“现象学”。如上所述,有了评价的概念,我们就可以开始研究评价与各种情绪哲学理论之间的关系和差异,或者评价在驱动想象力方面的作用(Gershman 等人,2017 年)。此外,评价不会以任何方式削弱“思考”或“计算”思维,而是为修改我们现有的哲学和心理认知分类法带来了新途径(Janssen 等人,2017 年)。
The notion of valuation—normatively rich, empirically substantiated—allows us to put pressure on this type of traditional, dichotomized view. At a minimum, it challenges the idea that we can in good scientific conscience continue to group together phenomena as disparate as emotions, consciousness, and ego under the heading of ‘phenomenology.’ As noted above, with a notion of valuation in place, we can, for instance, start to work out the relationship and differences between valuation and the various philosophical theories of emotion, or the role of valuation in driving instances of imagination (Gershman et al., 2017). Moreover, without in any way diminishing the ‘thinking’ or ‘computational’ mind, valuation brings with it new avenues for revising our extant philosophical and psychological cognitive taxonomies (Janssen et al., 2017).
更广泛地说,评价的概念挑战了我们关于哪些方面的思维可以量化或不能量化的假设——从而用正确的科学术语来理解。例如,在讨论“智力”和“智能”过程时,Silver 及其同事 (2021) 在很大程度上诉诸于传统思维的特征,例如感知、语言和概括。但前述讨论应该表明,我们也可以诉诸 RLDM 的规范原则来更好地分解和理解那些据称更“定性”的思维方面,例如评价——以及我们个人层面的能力,例如动机、认知控制、选择和道德认知。
More broadly, the notion of valuation challenges our assumptions regarding which aspects of mind can or cannot be quantified—and thereby understood in properly scientific terms. For example, in their discussion of “intelligence” and “intelligent” processes, Silver and colleagues (2021) largely appeal to features of the conventionally thinking mind such as perception, language, and generalization. But what the foregoing discussion should show is that we can also appeal to the normative principles of RLDM to better decompose and understand those allegedly more ‘qualitative’ aspects of the mind such as valuation—and, by extension, our personal-level capacities such motivation, cognitive control, choice, and moral cognition.
我们还应该将这些见解运用到我们正在进行的心智设计工作中。也就是说,随着我们在更复杂的人工智能,特别是通用人工智能方面取得进展,我们可以加深对我们可以也应该纳入这些努力中的心智能力类型的理解——我们应该超越传统意义上只设计“思考”机器的想法。
We should also carry these insights forward into our ongoing efforts at mind design. That is, as we make advancements toward more sophisticated artificial intelligence and, particularly artificial general intelligence, we can enrich our understanding of the kinds of mental capacities that we can and should include in these efforts—and we should move past the idea of designing only ‘thinking’ machines in the traditional sense.
1.当然,心灵哲学和认知科学中的许多方法都假设我们所谓的“复合状态”,例如欲望,它们可能具有类似的评价性。但与这些观点一致的是,评价复合状态是异常值——即“其他东西”——并被传统的描述性计算和类似信念的状态和过程所掩盖。较弱的论点提出了更强的主张,因为它在基本层面上提出了广泛的评价处理,值得注意的是,评价处理甚至调节类似信念的状态和过程。感谢 Murray Shanahan 就这一点向我施压。
1. Of course, many approaches in the philosophy of mind and cognitive science posit what we might call ‘compound states,’ such as desires, that may be similarly evaluative. But it’s consistent with such views that evaluative compound states are outliers—that “other stuff”—and overshadowed by traditional descriptive computations and belief-like states and processes. The weaker thesis makes a stronger claim, in that it posits widespread evaluative processing at a fundamental level and, notably, where evaluative processing modulates even belief-like states and processes. Thanks to Murray Shanahan for pressing me on this point.
2.名称改编自 Gesiarz 和 Crockett (2015)。
2. Name adapted from Gesiarz and Crockett (2015).
3.感谢 Neil Rabinowitz 提出此公式。
3. Thanks to Neil Rabinowitz for this formulation.
4.有关更详细的讨论,请参阅 Sutton 和 Barto(2018a,第 6 章,尤其是示例 6.1。)
4. For a more detailed discussion, see Sutton and Barto (2018a, Chapter 6, and especially Example 6.1.)
5.感谢 Neil Rabinowitz 提出此公式。
5. Thanks to Neil Rabinowitz for this formulation.
6.本节感谢 Sutton 和 Barto (2018a),尤其是 Neil Rabinowitz。
6. This section is indebted to Sutton and Barto (2018a) and, especially, to Neil Rabinowitz.
7.这种亚个人过程很可能在我们个人层面的“价值”、“评价”和“价值观”体验中发挥作用,例如,参见前述支付意愿的讨论。但本文的其余部分将重点关注亚个人过程的性质和运作方式。
7. This subpersonal process very likely plays a role in our personal-level experiences of ‘value,’ ‘valuing,’ and ‘values,’ e.g., see foregoing discussion of willingness-to-pay. But the focus throughout the remainder of this paper will be on the nature and workings of the subpersonal process.
8.值得强调的是,评价不必“在线”才能指导选择。相反,如前面追溯归因的例子一样,选择可以而且经常受到过去奖励和价值归因的影响。而评价作为选择的这种“延续”特征反过来对自我调节和控制的性质具有重要意义,因为它意味着至少在许多情况下,我们对自己的动机状态没有直接的、心理上的控制(见 Haas,准备中)。感谢 Neil Rabinowitz 就这一点向我施压。
8. It is worth emphasizing that valuation needn’t be ‘online’ in order to guide selection. On the contrary, as in the foregoing example of retroactive attribution, selection can and often is informed by past reward and value attributions. And this ‘carried over’ feature of valuation as selection in turn has important implications for the nature of self-regulation and control, insofar as it implies that at least in many cases, we do not have direct, intrapsychic control over our motivational states (see Haas, in prep). Thanks to Neil Rabinowitz for pressing me on this point.
9 . 并且已经存在了数百万年:例如,参见强化信号在果蝇中的作用(Waddell (2013);另见 Haas 和 Klein (2020))。虽然这超出了本文的范围,但评价似乎是一个高度保守的认知过程。
9. And has been for millions of years: see, e.g., the role of reinforcement signaling in Drosophila (Waddell (2013); see also Haas and Klein (2020)). Though this is beyond the scope of the current paper, valuation appears to be a highly conserved cognitive process.
10.感谢 Carl Craver 帮助我深入探讨这一点。
10. Thanks to Carl Craver for helping me drill down on this point.
11.感谢 Neil Rabinowitz 就这一点对我的督促,以及感谢 Sean、Legassick 和 Hado van Hasselt 对 REI 论文的有益讨论。
11. Thanks to Neil Rabinowitz for pressing me on this point, and to Sean and Legassick and Hado van Hasselt for helpful discussions of the REI thesis.
安迪·克拉克
Andy Clark
2013
2013
“大脑的整个功能可以概括为:纠错。”英国精神病学家和控制论专家 W. Ross Ashby 在大约半个世纪前这样写道。1从那时起,计算神经科学已经取得了长足的进步。现在有越来越多的理由相信 Ashby 的(诚然有些含糊)说法是正确的,并且它抓住了花费代谢资金来构建复杂大脑在寻求适应性成功方面获得回报的关键因素。特别是,现在看来,大脑的关键技巧之一是实施愚蠢的过程来纠正某种错误:输入的多层预测中的错误。在哺乳动物的大脑中,这种错误似乎可以在一系列皮质处理事件中得到纠正,在这些事件中,高级系统试图根据它们自己新兴的世界因果结构模型(即信号源)预测低级系统的输入。预测低级输入的错误会导致高级模型进行调整,以减少差异。这一过程通过多个相互关联的高级模型进行操作,使大脑能够编码有关经常扰乱大脑的信号源的丰富信息。
“The whole function of the brain is summed up in: error correction.” So wrote W. Ross Ashby, the British psychiatrist and cyberneticist, some half a century ago.1 Computational neuroscience has come a very long way since then. There is now increasing reason to believe that Ashby’s (admittedly somewhat vague) statement is correct, and that it captures something crucial about the way that spending metabolic money to build complex brains pays dividends in the search for adaptive success. In particular, one of the brain’s key tricks, it now seems, is to implement dumb processes that correct a certain kind of error: error in the multi-layered prediction of input. In mammalian brains, such errors look to be corrected within a cascade of cortical processing events in which higher-level systems attempt to predict the inputs to lower-level ones on the basis of their own emerging models of the causal structure of the world (i.e., the signal source). Errors in predicting lower level inputs cause the higher-level models to adapt so as to reduce the discrepancy. Such a process, operating over multiple linked higher-level models, yields a brain that encodes a rich body of information about the source of the signals that regularly perturb it.
这些模型遵循亥姆霍兹(1860/1962)的观点,将感知描述为一个概率性的、知识驱动的推理过程。亥姆霍兹提出了一个关键思想,即感觉系统的工作很复杂,即从身体效应中推断出感觉原因。这反过来又涉及计算多个概率分布,因为一个这样的效应将与许多不同的原因集相一致,而这些原因集只通过它们的相对(和上下文相关的)发生概率来区分。
Such models follow Helmholtz (1860/1962) in depicting perception as a process of probabilistic, knowledge-driven inference. From Helmholz comes the key idea that sensory systems are in the tricky business of inferring sensory causes from their bodily effects. This in turn involves computing multiple probability distributions, since a single such effect will be consistent with many different sets of causes distinguished only by their relative (and context dependent) probability of occurrence.
Helmholz 的洞察力启发了 MacKay (1956)、Neisser (1967) 和 Gregory (1980) 的有影响力的研究,这些研究是认知心理学传统的一部分,被称为“综合分析法”(有关评论,请参阅 Yuille 和 Kersten,2006)。在这个范式中,大脑不会简单地通过自下而上地积累大量低级线索(如边缘图等)来构建其当前的远端原因模型(其世界模型)。相反(请参阅 Hohwy,2007),大脑会尝试从其可能原因的最佳模型中预测当前的线索集。通过这种方式:
Helmholz’s insight informed influential work by MacKay (1956), Neisser (1967), and Gregory (1980), as part of the cognitive psychological tradition that became known as “analysis-by-synthesis” (for a review, see Yuille and Kersten, 2006). In this paradigm, the brain does not build its current model of distal causes (its model of how the world is) simply by accumulating, from the bottom-up, a mass of low-level cues such as edge-maps and so forth. Instead (see Hohwy, 2007), the brain tries to predict the current suite of cues from its best models of the possible causes. In this way:
从低级到高级表示(例如从声学到词级)的映射是使用从高级到低级表示的反向映射来计算的。(Chater 和 Manning,2006 年,第 340 页,重点)
The mapping from low- to high-level representation (e.g. from acoustic to word-level) is computed using the reverse mapping, from high- to low-level representation. (Chater and Manning, 2006, p. 340, their emphasis)
大量重要的计算和神经科学研究也采用了亥姆霍兹的洞见。这一流派的关键是机器学习的开创性进步,这些进步始于反向传播学习的开创性联结主义工作(McClelland 等人,1986b;Rumelhart 等人,1986b),并继续研究名副其实的“亥姆霍兹机器”(Dayan 等人,1995;Dayan 和 Hinton,1996;另见 Hinton 和 Zemel,1994)。2亥姆霍兹机器试图在多级系统中学习新的表示(从而捕捉领域内越来越深的规律),而无需提供所需输入输出映射的大量预分类样本。在这方面,它旨在改进(见 Hinton,2010)标准的反向传播驱动学习。它通过使用自己的自上而下的连接为隐藏单元提供所需的状态来实现这一点,从而(实际上)使用生成模型来自我监督其感知“识别模型”的发展,该生成模型试图为自己创建感官模式(有时被称为“幻想”)。3 (有关这一关键创新的有用评论以及对许多后续发展的调查,请参阅 Hinton,2007a)。
Helmholz’s insight was also pursued in an important body of computational and neuroscientific work. Crucial to this lineage were seminal advances in machine learning that began with pioneering connectionist work on back-propagation learning (McClelland et al., 1986b; Rumelhart et al., 1986b) and continued with work on the aptly named “Helmholz Machine” (Dayan et al., 1995; Dayan and Hinton, 1996; see also Hinton and Zemel, 1994).2 The Helmholtz Machine sought to learn new representations in a multilevel system (thus capturing increasingly deep regularities within a domain) without requiring the provision of copious pre-classified samples of the desired input-output mapping. In this respect, it aimed to improve (see Hinton, 2010) upon standard back-propagation driven learning. It did this by using its own top-down connections to provide the desired states for the hidden units, thus (in effect) self-supervising the development of its perceptual “recognition model” using a generative model that tried to create the sensory patterns for itself (in “fantasy,” as it was sometimes said).3 (For a useful review of this crucial innovation and a survey of many subsequent developments, see Hinton, 2007a).
在这个非常具体的意义上,生成模型旨在通过追踪(可以说,通过示意性地重现)造成该结构的因果矩阵来捕捉某些观察到的输入集的统计结构。因此,一个好的视觉生成模型将试图捕捉观察到的低级视觉反应是如何由相互作用的原因网络产生的——例如,视觉呈现的场景的各个方面。在实践中,这意味着多层次(分层和双向)系统内自上而下的连接会编码一个概率模型,该模型表示低层次内单位和单位组的活动,从而追踪(我们很快就会详细看到)信号源中相互作用的原因,这些原因可能是身体或外部世界——例如,参见 Kawato 等人(1993 年);Hinton 和 Zemel(1994 年);Mumford(1994 年);Hinton 等人(1995 年);Dayan 等人(1995 年); Olshausen 和 Field (1996);Dayan (1997);Hinton 和 Ghahramani (1997)。
A generative model, in this quite specific sense, aims to capture the statistical structure of some set of observed inputs by tracking (one might say, by schematically recapitulating) the causal matrix responsible for that very structure. A good generative model for vision would thus seek to capture the ways in which observed lower-level visual responses are generated by an interacting web of causes—for example, the various aspects of a visually presented scene. In practice, this means that top-down connections within a multilevel (hierarchical and bidirectional) system come to encode a probabilistic model of the activities of units and groups of units within lower levels, thus tracking (as we shall shortly see in more detail) interacting causes in the signal source, which might be the body or the external world—see, for example, Kawato et al. (1993); Hinton and Zemel (1994); Mumford (1994); Hinton et al. (1995); Dayan et al. (1995); Olshausen and Field (1996); Dayan (1997); Hinton and Ghahramani (1997).
正是这种转变——使用自上而下的连接尝试使用高级知识通过深度多级级联生成一种“虚拟版本”感官数据的策略——是感知“分层预测编码”方法的核心;例如,Rao 和 Ballard (1999)、Lee 和 Mumford (2003)、Friston (2005)。这些方法以及它们最近对行动的扩展——以 Friston 和 Stephan (2007)、Friston 等人 (2009)、Friston (2010)、Brown 等人 (2011) 为例——构成了本研究的主要焦点。这些方法将自上而下的概率生成模型的使用与对这种向下影响可能运作的一种方式的特定愿景相结合。这种方式(借用线性预测编码的研究成果——见下文)将自上而下的流程描述为试图预测并完全“解释”驱动感官信号,只留下任何残留的“预测误差”在系统内传播信息——参见 Rao 和 Ballard (1999)、Lee 和 Mumford (2003)、Friston (2005)、Hohwy 等人 (2008)、Jehee 和 Ballard (2009)、Friston (2010)、Brown 等人 (2011);有关最近的评论,请参阅 Huang 和 Rao (2011)。
It is this twist—the strategy of using top-down connections to try to generate, using high-level knowledge, a kind of “virtual version” of the sensory data via a deep multilevel cascade—that lies at the heart of “hierarchical predictive coding” approaches to perception; for example, Rao and Ballard (1999), Lee and Mumford (2003), Friston (2005). Such approaches, along with their recent extensions to action—as exemplified in Friston and Stephan (2007), Friston et al. (2009), Friston (2010), Brown et al. (2011)—form the main focus of the present treatment. These approaches combine the use of top-down probabilistic generative models with a specific vision of one way such downward influence might operate. That way (borrowing from work in linear predictive coding—see below) depicts the top-down flow as attempting to predict and fully “explain away” the driving sensory signal, leaving only any residual “prediction errors” to propagate information forward within the system—see Rao and Ballard (1999), Lee and Mumford (2003), Friston (2005), Hohwy et al. (2008), Jehee and Ballard (2009), Friston (2010), Brown et al. (2011); and, for a recent review, see Huang and Rao (2011).
预测编码本身最初是作为信号处理中的一种数据压缩策略而开发的(有关历史,请参阅 Shi 和 Sun,1999 年)。因此,考虑一个基本任务,例如图像传输:在大多数图像中,一个像素的值通常会预测其最近邻居的值,差异标志着物体之间的边界等重要特征。这意味着,可以通过仅对“意外”变化进行编码来压缩丰富图像的代码(对于正确了解情况的接收者而言):实际值偏离预测值的情况。因此,需要传输的只是实际当前信号和预测信号之间的差异(也称为“预测误差”)。这可以大大节省带宽,这种经济性是詹姆斯·弗拉纳根 (James Flanagan) 和贝尔实验室的其他人在 1950 年代开发该技术的驱动力(有关评论,请参阅 Musmann,1979 年)。这种压缩技术的后代目前用于 JPEG、各种形式的无损音频压缩以及视频的运动压缩编码。在所有这些情况下,需要“向上”传达的信息就是预测误差:与预期信号的偏差。转换(以我们即将探索的方式)到神经领域,这使得预测误差成为感官信息本身的一种代理(Feldman 和 Friston,2010)。稍后,当我们在信息论和熵的更大背景下考虑预测处理时,我们将看到预测误差报告了由于遇到的感官信号与预测信号不匹配而引起的“意外”。更正式地说——为了将其与正常的、经验丰富意义上的意外区分开来——这被称为意外(Tribus,1961)。
Predictive coding itself was first developed as a data compression strategy in signal processing (for a history, see Shi and Sun, 1999). Thus, consider a basic task such as image transmission: In most images, the value of one pixel regularly predicts the value of its nearest neighbors, with differences marking important features such as the boundaries between objects. That means that the code for a rich image can be compressed (for a properly informed receiver) by encoding only the “unexpected” variation: the cases where the actual value departs from the predicted one. What needs to be transmitted is therefore just the difference (a.k.a. the “prediction error”) between the actual current signal and the predicted one. This affords major savings on bandwidth, an economy that was the driving force behind the development of the techniques by James Flanagan and others at Bell Labs during the 1950s (for a review, see Musmann, 1979). Descendents of this kind of compression technique are currently used in JPEGs, in various forms of lossless audio compression, and in motion-compressed coding for video. The information that needs to be communicated “upward” under all these regimes is just the prediction error: the divergence from the expected signal. Transposed (in ways we are about to explore) to the neural domain, this makes prediction error into a kind of proxy (Feldman and Friston, 2010) for sensory information itself. Later, when we consider predictive processing in the larger setting of information theory and entropy, we will see that prediction error reports the “surprise” induced by a mismatch between the sensory signals encountered and those predicted. More formally—and to distinguish it from surprise in the normal, experientially loaded sense—this is known as surprisal (Tribus, 1961).
分层预测处理将多级双向级联中的“自上而下”概率生成模型与高效编码和传输的核心预测编码策略相结合。这种方法最初是在感知领域开发的,现已扩展(由 Friston 等人 - 见第 17.1.3 节)以涵盖行动,并提供了一个关于大脑学习、推理和控制可塑性的有吸引力的统一视角。如果这些统一模型是正确的,那么感知和行动是密切相关的,它们共同通过塑造和选择感官输入来减少预测误差。在本节的其余部分,我将重述这些模型的一些主要特征,然后重点介绍(在后面的第 17.2-17.5 节中)它们在概念上最重要的一些方面和挑战性方面。
Hierarchical predictive processing combines the use, within a multilevel bidirectional cascade, of “top-down” probabilistic generative models with the core predictive coding strategy of efficient encoding and transmission. Such approaches, originally developed in the domain of perception, have been extended (by Friston and others—see sect. 17.1.3) to encompass action, and to offer an attractive, unifying perspective on the brain’s capacities for learning, inference, and the control of plasticity. Perception and action, if these unifying models are correct, are intimately related and work together to reduce prediction error by sculpting and selecting sensory inputs. In the remainder of this section, I rehearse some of the main features of these models before highlighting (in sects. 17.2-17.5 following) some of their most conceptually important and challenging aspects.
一个好的起点(遵循 Rieke,1999)是可以被认为是“从黑匣子内部观察”的东西。因为,当从一定距离来看时,大脑的任务似乎是不可能的:它必须在无法直接访问信号源的情况下发现有关撞击信号的可能原因的信息。因此,考虑一个从复杂的外部世界获取输入的黑匣子。该盒子具有信号流动的输入和输出通道。但是,它“知道”的只是它自己的状态(例如,脉冲序列)流动和改变的方式。在这种(受限)意义上,系统可以直接访问的只是它自己的状态。因此,世界本身是禁区(尽管盒子可以发出运动命令并等待发展,这一点很重要)。大脑就是这样一个黑匣子。仅基于自身内部状态的变化模式,它如何改变和调整其反应,以便调整自身,成为一个有用的节点(值得其相对巨大的代谢消耗),以产生适应性反应?请注意,这一概念与将问题提出为在环境和内部状态之间建立映射关系的概念有多么不同。任务不是找到这样的映射,而是从变化的输入信号本身推断信号源(世界)的性质。
A good place to start (following Rieke, 1999) is with what might be thought of as the “view from inside the black box.” For, the task of the brain, when viewed from a certain distance, can seem impossible: it must discover information about the likely causes of impinging signals without any form of direct access to their source. Thus, consider a black box taking inputs from a complex external world. The box has input and output channels along which signals flow. But all that it “knows”, in any direct sense, are the ways its own states (e.g., spike trains) flow and alter. In that (restricted) sense, all the system has direct access to is its own states. The world itself is thus off-limits (though the box can, importantly, issue motor commands and await developments). The brain is one such black box. How, simply on the basis of patterns of changes in its own internal states, is it to alter and adapt its responses so as to tune itself to act as a useful node (one that merits its relatively huge metabolic expense) for the origination of adaptive responses? Notice how different this conception is to ones in which the problem is posed as one of establishing a mapping relation between environmental and inner states. The task is not to find such a mapping but to infer the nature of the signal source (the world) from just the varying input signal itself.
分层方法采用自上而下的生成模型来预测传感数据流,这为在这种看似没有希望的条件下取得进展提供了强有力的手段。根据这些模型,大脑执行的一项关键任务是猜测其自身神经经济的下一个状态。当您使用信号源的良好模型时,这种猜测会得到改善。在贝叶斯模式下,好的猜测会增加模型的后验概率4。各种形式的梯度下降学习可以逐步改善您的初始猜测。在分层预测处理5机制中应用,这将——如果您存活足够长的时间——倾向于产生有用的信号源(最终是世界)生成模型。
Hierarchical approaches in which top-down generative models are trying to predict the flow of sensory data provide a powerful means for making progress under such apparently unpromising conditions. One key task performed by the brain, according to these models, is that of guessing the next states of its own neural economy. Such guessing improves when you use a good model of the signal source. Cast in the Bayesian mode, good guesses thus increase the posterior probability4 of your model. Various forms of gradient descent learning can progressively improve your first guesses. Applied within a hierarchical predictive processing5 regime, this will—if you survive long enough—tend to yield useful generative models of the signal source (ultimately, the world).
双向层次结构的妙处在于,它允许系统在进行过程中推断自己的先验(猜测程序所必需的先验信念)。它通过使用其最佳当前模型(在某一层)作为下一层先验的来源,参与“迭代估计”过程(参见 Dempster 等人,1977 年;Neal 和 Hinton,1998 年),使先验和模型能够在多个链接的处理层上共同进化,以解释感官数据。因此,双向层次结构的存在会以层次结构中某一层对下一层施加的约束的形式引发“经验先验” 6,并且这些约束会由感官输入本身逐步调整。这种程序(实现了“经验贝叶斯”的一个版本;Robbins,1956)与已知的关于皮质的层次化和相互连接的结构和连接的事实有着吸引人的映射关系(Friston,2005;Lee and Mumford,2003)。7
The beauty of the bidirectional hierarchical structure is that it allows the system to infer its own priors (the prior beliefs essential to the guessing routines) as it goes along. It does this by using its best current model—at one level—as the source of the priors for the level below, engaging in a process of “iterative estimation” (see Dempster et al., 1977; Neal and Hinton, 1998) that allows priors and models to co-evolve across multiple linked layers of processing so as to account for the sensory data. The presence of bidirectional hierarchical structure thus induces “empirical priors”6 in the form of the constraints that one level in the hierarchy places on the level below, and these constraints are progressively tuned by the sensory input itself. This kind of procedure (which implements a version of “empirical Bayes”; Robbins, 1956) has an appealing mapping to known facts about the hierarchical and reciprocally connected structure and wiring of cortex (Friston, 2005; Lee and Mumford, 2003).7
一个经典的早期例子是 Rao 和 Ballard (1999) 模型,它将这种分层学习与第 17.1.1 节中描述的基本预测编码策略相结合视觉皮层中的预测编码。在最低层,存在某种能量刺激模式,由感觉受体从当前视觉场景产生的环境光模式中传递(假设)。然后通过多级级联处理这些信号,其中每个级别都试图通过向后的8 个连接预测其下方级别的活动。向后连接允许处理某一阶段的活动作为前一阶段的另一个输入返回。只要这成功预测了较低级别的活动,一切就都很好,不需要采取进一步的行动。但如果不匹配,就会发生“预测错误”,随后的(错误指示)活动会传播到更高级别。这会自动调整更高级别的概率表示,以便自上而下的预测可以抵消较低级别的预测错误(产生快速的感知推理)。同时,预测误差用于调整模型的结构,以减少下一次的任何差异(产生较慢的时间尺度感知学习)。因此,层级之间的前向连接带有将预测与实际较低层级活动分开的“残余误差”(Rao 和 Ballard,1999 年,第 79 页),而后向连接(在这些模型中承担了大部分“繁重工作”)则带有预测本身。改变预测相当于改变或调整你对较低层级活动隐藏原因的假设。在松散的双向皮质区域层级内同时运行这种预测误差计算,可以将与不同空间和时间尺度的规律性有关的信息整合到一个相互一致的整体中,其中每个“假设”都用于帮助调整其余假设。正如作者所说:
A classic early example, combining this kind of hierarchical learning with the basic predictive coding strategy described in section 17.1.1, is Rao and Ballard (1999) model of predictive coding in the visual cortex. At the lowest level, there is some pattern of energetic stimulation, transduced (let’s suppose) by sensory receptors from ambient light patterns produced by the current visual scene. These signals are then processed via a multilevel cascade in which each level attempts to predict the activity at the level below it via backward8 connections. The backward connections allow the activity at one stage of the processing to return as another input at the previous stage. So long as this successfully predicts the lower level activity, all is well, and no further action needs to ensue. But where there is a mismatch, “prediction error” occurs and the ensuing (error-indicating) activity is propagated to the higher level. This automatically adjusts probabilistic representations at the higher level so that top-down predictions cancel prediction errors at the lower level (yielding rapid perceptual inference). At the same time, prediction error is used to adjust the structure of the model so as to reduce any discrepancy next time around (yielding slower timescale perceptual learning). Forward connections between levels thus carry the “residual errors” (Rao and Ballard, 1999, p. 79) separating the predictions from the actual lower level activity, while backward connections (which do most of the “heavy lifting” in these models) carry the predictions themselves. Changing predictions corresponds to changing or tuning your hypothesis about the hidden causes of the lower level activity. The concurrent running of this kind of prediction error calculation within a loose bidirectional hierarchy of cortical areas allows information pertaining to regularities at different spatial and temporal scales to settle into a mutually consistent whole in which each “hypothesis” is used to help tune the rest. As the authors put it:
预测和纠错循环在整个层次结构中同时发生,因此自上而下的信息会影响较低级别的估计,而自下而上的信息会影响输入信号的较高级别的估计。(Rao 和 Ballard,1999 年,第 80 页)
Prediction and error-correction cycles occur concurrently throughout the hierarchy, so top-down information influences lower-level estimates, and bottom-up information influences higher-level estimates of the input signal. (Rao and Ballard, 1999, p. 80)
在视觉皮层中,这种方案表明从 V2 到 V1 的后向连接将携带对 V1 中预期活动的预测,而从 V1 到 V2 的前向连接将携带指示残留(未预测)活动的错误信号。
In the visual cortex, such a scheme suggests that backward connections from V2 to V1 would carry a prediction of expected activity in V1, while forward connections from V1 to V2 would carry forward the error signal indicating residual (unpredicted) activity.
为了测试这些想法,Rao 和 Ballard 实现了一个简单的双向分层网络,其中包含此类“预测估计器”,并使用来自五个自然场景的图像块对其进行训练。使用学习算法,逐步减少链接级联中的预测误差,在接触数千个图像块后,系统学会了使用第一级网络中的响应来提取诸如定向边缘和条形等特征,而第二级网络则开始捕捉与涉及更大空间配置的模式相对应的此类特征组合。该模型还显示了许多有趣的“超经典感受野”效应,表明这种非经典环绕效应(以及我们稍后将看到的更普遍的上下文效应)可能是使用分层预测编码的直接结果。
To test these ideas, Rao and Ballard implemented a simple bidirectional hierarchical network of such “predictive estimators” and trained it on image patches derived from five natural scenes. Using learning algorithms that progressively reduce prediction error across the linked cascade and after exposure to thousands of image patches, the system learnt to use responses in the first level network to extract features such as oriented edges and bars, while the second level network came to capture combinations of such features corresponding to patterns involving larger spatial configurations. The model also displayed a number of interesting “extra-classical receptive field” effects, suggesting that such non-classical surround effects (and, as we’ll later see, context effects more generally) may be a rather direct consequence of the use of hierarchical predictive coding.
然而,就眼前的目的而言,重要的是,预测编码方法只给出了来自自然图像的信号的统计特性,能够归纳出一种输入数据结构的生成模型:它了解了线、边和条等特征的存在和重要性,以及这些特征的组合,从而能够更好地预测接下来在空间或时间上会发生什么。层次结构中预测误差的逐步减少引发的级联处理揭示了黑匣子之外的世界。它最大化了生成观察状态(感官输入)的后验概率,并在此过程中,归纳出一种信号源的内部模型:隐藏在感知面纱后面的世界。
For immediate purposes, however, what matters is that the predictive coding approach, given only the statistical properties of the signals derived from the natural images, was able to induce a kind of generative model of the structure of the input data: It learned about the presence and importance of features such as lines, edges, and bars, and about combinations of such features, in ways that enable better predictions concerning what to expect next, in space or in time. The cascade of processing induced by the progressive reduction of prediction error in the hierarchy reveals the world outside the black box. It maximizes the posterior probability of generating the observed states (the sensory inputs), and, in so doing, induces a kind of internal model of the source of the signals: the world hidden behind the veil of perception.
Friston(2003 年;2010 年;以及与同事:Brown 等人,2011 年;Friston 等人,2009 年)的最新研究将这一基本的“分层预测处理”模型推广到包括行动。根据我现在称之为“面向行动的预测处理” 9,感知和行动都遵循相同的深层“逻辑”,甚至使用相同的计算策略来实现。因此,这些解释的一个根本吸引力在于它们能够提供对感知、认知和行动的深度统一解释。
Recent work by Friston (2003; 2010; and with colleagues: Brown et al., 2011; Friston et al., 2009) generalizes this basic “hierarchical predictive processing” model to include action. According to what I shall now dub “action-oriented predictive processing,”9 perception and action both follow the same deep “logic” and are even implemented using the same computational strategies. A fundamental attraction of these accounts thus lies in their ability to offer a deeply unified account of perception, cognition, and action.
正如我们所见,感知在这里被描述为一个过程,它试图将传入的“驱动”信号与一系列自上而下的预测(跨越多个空间和时间尺度)相匹配,这些预测旨在抵消它。运动行为表现出令人惊讶的相似特征,不同之处在于:
Perception, as we saw, is here depicted as a process that attempts to match incoming “driving” signals with a cascade of top-down predictions (spanning multiple spatial and temporal scales) that aim to cancel it out. Motor action exhibits a surprisingly similar profile, except that:
在运动系统中,错误信号自我抑制,不是通过神经介导效应,而是通过引发改变自下而上的本体感受和感觉输入的运动。这种对感知和行动的统一观点表明,行动既是感知的,也是由感知引起的。(弗里斯顿,2003 年,第 1349 页)
In motor systems error signals self-suppress, not through neuronally mediated effects, but by eliciting movements that change bottom-up proprioceptive and sensory input. This unifying perspective on perception and action suggests that action is both perceived and caused by its perception. (Friston, 2003, p. 1349)
霍金斯和布莱克斯利完美地描述了整个场景,他们写道:
This whole scenario is wonderfully captured by Hawkins and Blakeslee, who write that:
听起来很奇怪,当涉及到你自己的行为时,你的预测不仅先于感觉,而且决定了感觉。考虑进入序列中的下一个模式会导致你对接下来应该经历的事情产生连锁预测。随着连锁预测的展开,它会产生实现预测所需的运动命令。思考、预测和行动都是沿着皮质层级向下移动的序列展开的一部分。(Hawkins 和 Blakeslee,2004 年,第 158 页)
As strange as it sounds, when your own behaviour is involved, your predictions not only precede sensation, they determine sensation. Thinking of going to the next pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading prediction unfolds, it generates the motor commands necessary to fulfil the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy. (Hawkins and Blakeslee, 2004, p. 158)
所谓最优反馈控制理论(例如 Todorov,2009;Todorov 和 Jordan,2002)中一项密切相关的工作表明,运动控制问题在数学上等同于贝叶斯推理。非常粗略地讲——请参阅 Todorov (2009) 的详细说明——你将期望(目标)状态视为观察到的状态,并执行贝叶斯推理以找到让你到达那里的动作。这种感知与行动之间的映射也出现在一些最近的规划研究中(例如 Toussaint,2009)。这个想法与这些简单的运动控制方法密切相关,即在规划中,我们将未来的目标状态想象为实际的,然后使用贝叶斯推理来找到一组中间状态(它们现在本身可以是整个动作),从而让我们到达那里。因此出现了一种根本性的统一的计算模型集,正如 Toussaint (2009, p. 29) 所评论的那样,“不区分传感器处理、运动控制或规划问题”。然而,Toussaint 的大胆主张被重要的警告所修改(同上,第 29 页),即我们在实践中必须部署专门用于不同任务的近似和表示。但至少,现在看来,感知和行动在某种深层意义上是计算上的兄弟,并且:
A closely related body of work in so-called optimal feedback control theory (e.g., Todorov, 2009; Todorov and Jordan, 2002) displays the motor control problem as mathematically equivalent to Bayesian inference. Very roughly—see Todorov (2009) for a detailed account—you treat the desired (goal) state as observed and perform Bayesian inference to find the actions that get you there. This mapping between perception and action emerges also in some recent work on planning (e.g., Toussaint, 2009). The idea, closely related to these approaches to simple movement control, is that in planning we imagine a future goal state as actual, then use Bayesian inference to find the set of intermediate states (which can now themselves be whole actions) that get us there. There is thus emerging a fundamentally unified set of computational models which, as Toussaint (2009, p. 29) comments, “does not distinguish between the problems of sensor processing, motor control, or planning.” Toussaint’s bold claim is modified, however, by the important caveat (op. cit., p. 29) that we must, in practice, deploy approximations and representations that are specialized for different tasks. But at the very least, it now seems likely that perception and action are in some deep sense computational siblings and that:
通过感知解释传入信息的最佳方式与通过运动控制传出信息的最佳方式大体相同……因此,存在一些可指定的计算原理来控制神经功能这一观点似乎是合理的。(Eliasmith,2007 年,第 380 页)
The best ways of interpreting incoming information via perception, are deeply the same as the best ways of controlling outgoing information via motor action … so the notion that there are a few specifiable computational principles governing neural function seems plausible. (Eliasmith, 2007, p. 380)
然而,以行动为导向的预测处理更进一步表明,运动意图通过展开为详细的运动动作,主动引发大脑预测的持续感官(尤其是本体感受)结果流。感知和行动之间的这种深层统一在所谓的主动推理中表现得最为明显,在这种推理中,代理以相当于主动寻求或产生它们(或者更确切地说,它们的大脑)所期望的感官后果的方式移动其传感器(Friston,2009;Friston 等人,2010)。感知、认知和行动(如果这种统一的观点被证明是正确的)紧密合作,通过选择性采样和主动塑造刺激阵列,最大限度地减少感官预测误差。因此,它们合谋让生物穿越时间和空间,以满足一组不断变化且深度相互关联的(亚个人)期望。根据这些描述,那么:
Action-oriented predictive processing goes further, however, in suggesting that motor intentions actively elicit, via their unfolding into detailed motor actions, the ongoing streams of sensory (especially proprioceptive) results that our brains predict. This deep unity between perception and action emerges most clearly in the context of so-called active inference, where the agent moves its sensors in ways that amount to actively seeking or generating the sensory consequences that they (or rather, their brains) expect (Friston, 2009; Friston et al., 2010). Perception, cognition, and action—if this unifying perspective proves correct—work closely together to minimize sensory prediction errors by selectively sampling, and actively sculpting, the stimulus array. They thus conspire to move a creature through time and space in ways that fulfil an ever-changing and deeply inter-animating set of (sub-personal) expectations. According to these accounts, then:
感知学习和推理对于引发关于感觉中枢如何展开的先验预期是必不可少的。采取行动来重新采样世界以满足这些预期。这使感知和行动处于密切的关系中,并用相同的原理来解释两者。(Friston 等人,2009 年,第 12 页)
Perceptual learning and inference is necessary to induce prior expectations about how the sensorium unfolds. Action is engaged to resample the world to fulfil these expectations. This places perception and action in intimate relation and accounts for both with the same principle. (Friston et al., 2009, p. 12)
在某些(我称之为“沙漠景观”)版本的故事中(请特别参阅 Friston,2011b;Friston 等人,2010),本体感受预测误差直接充当运动指令。在这些模型中,我们对移动和行动的本体感受后果的预期直接导致了移动和行动。10我将在第 17.5.1 节中简要回顾这些“沙漠景观”场景。
In some (I’ll call them the “desert landscape”) versions of this story (see especially Friston, 2011b; Friston et al., 2010) proprioceptive prediction errors act directly as motor commands. On these models it is our expectations about the proprioceptive consequences of moving and acting that directly bring the moving and acting about.10 I return briefly to these “desert landscape” scenarios in section 17.5.1 further on.
我现在要说的是,分层预测处理解释以及最近对行动的概括代表了我们以前对感知、认知和人类认知架构的许多思考方式的真正背离。它对神经表征、神经计算和表征关系本身提供了独特的解释。它将感知、认知和行动描述为高度统一的,并且在重要方面是连续的。它为大脑执行某种形式的贝叶斯推理的说法提供了神经上合理且计算上易于处理的解释。
The hierarchical predictive processing account, along with the more recent generalizations to action represents, or so I shall now argue, a genuine departure from many of our previous ways of thinking about perception, cognition, and the human cognitive architecture. It offers a distinctive account of neural representation, neural computation, and the representation relation itself. It depicts perception, cognition, and action as profoundly unified and, in important respects, continuous. And it offers a neurally plausible and computationally tractable gloss on the claim that the brain performs some form of Bayesian inference.
如果这些模型是正确的,那么要想成功地在感知中表征世界,关键在于消除感官预测误差。因此,感知涉及通过将驱动(传入)感官信号与一系列以各种空间和时间尺度为基准的预测相匹配来“解释”该信号。这些预测反映了系统已经了解的世界(包括身体)以及与其自身处理相关的不确定性。感知在这里变得“充满理论”,至少在一个(相当具体的)意义上:我们所感知到的在很大程度上取决于大脑在尽力预测当前感官信号时所运用的一组先验知识(包括任何相关的超先验知识)。在这个模型上,感知要求生成模型的一些相互支持的状态堆栈(回想一下上面的第 17.1.1 节)能够成功最小化预测误差,方法是假设一组相互作用的远端原因来预测、适应并(因此)“解释”驱动感官信号。
To successfully represent the world in perception, if these models are correct, depends crucially upon cancelling out sensory prediction error. Perception thus involves “explaining away” the driving (incoming) sensory signal by matching it with a cascade of predictions pitched at a variety of spatial and temporal scales. These predictions reflect what the system already knows about the world (including the body) and the uncertainties associated with its own processing. Perception here becomes “theory-laden” in at least one (rather specific) sense: What we perceive depends heavily upon the set of priors (including any relevant hyper-priors) that the brain brings to bear in its best attempt to predict the current sensory signal. On this model, perception demands the success of some mutually supportive stack of states of a generative model (recall sect. 17.1.1 above) at minimizing prediction error by hypothesizing an interacting set of distal causes that predict, accommodate, and (thus) “explain away” the driving sensory signal.
这种“解释”的诉求很重要,也很关键,但需要非常谨慎地处理。它很重要,因为它反映了分层预测处理模型的关键特性,即大脑处于主动、持续的输入预测状态,而不仅仅是对外部刺激做出反应(即使在早期感觉情况下也是如此)。它很重要,因为它是这些模型表现出的有吸引力的编码效率的根源,因为所有需要通过系统向前传递的都是误差信号,这是预测和驱动信号匹配后剩下的信号。11因此,在这些模型中,后向(循环)连接承担了主要的信息处理负荷。然而,我们不应该过分强调这种差异。特别是,以下说法可能会产生误导:
This appeal to “explaining away” is important and central, but it needs very careful handling. It is important as it reflects the key property of hierarchical predictive processing models, which is that the brain is in the business of active, ongoing, input prediction and does not (even in the early sensory case) merely react to external stimuli. It is important also insofar as it is the root of the attractive coding efficiencies that these models exhibit, since all that needs to be passed forward through the system is the error signal, which is what remains once predictions and driving signals have been matched.11 In these models it is therefore the backward (recurrent) connectivity that carries the main information processing load. We should not, however, overplay this difference. In particular, it is potentially misleading to say that:
早期感觉区域的激活不再代表感觉信息本身,而仅代表高级区域未能成功预测的那部分输入。(de-Wit 等,2010,第 8702 页)
Activation in early sensory areas no longer represents sensory information per se, but only that part of the input that has not been successfully predicted by higher-level areas. (de-Wit et al., 2010, p. 8702)
这可能会产生误导,因为这只强调了实际上被描述为一种双工架构的一个方面(至少在我们一直在考虑的相当具体的模型12的背景下):这种架构在每个级别上都将非常传统的输入表示与错误表示相结合。 根据双工提案,被“解释掉”或抵消的是错误信号,在这些模型中,它被描述为由专用的“错误单元”计算。 这些与所谓的表示单元(用于编码感官输入的原因)相关联,但又有区别。 通过抵消错误单元的活动,一些横向相互作用的“表示”单元(然后向下提供预测并负责编码假定的感官原因)的活动实际上最终可以被选择和强化。因此,分层预测处理模型避免了与假设自上而下增强感觉信号选定方面的模型(例如,Desimone 和 Duncan, 1995 的偏向竞争模型)的任何直接冲突,因为:
It is potentially misleading because this stresses only one aspect of what is (at least in context of the rather specific models we have been considering12 ) actually depicted as a kind of duplex architecture: one that at each level combines quite traditional representations of inputs with representations of error. According to the duplex proposal, what gets “explained away” or cancelled out is the error signal, which (in these models) is depicted as computed by dedicated “error units.” These are linked to, but distinct from, the so-called representation units meant to encode the causes of sensory inputs. By cancelling out the activity of the error units, activity in some of the laterally interacting “representation” units (which then feed predictions downward and are in the business of encoding the putative sensory causes) can actually end up being selected and sharpened. The hierarchical predictive processing account thus avoids any direct conflict with accounts (e.g., biased-competition models such as that of Desimone and Duncan, 1995) that posit top-down enhancements of selected aspects of the sensory signal, because:
高级预测可以解释预测误差,并告诉误差单元“闭嘴”,而编码感官输入原因的单元则通过与误差单元的横向交互来选择,这些交互会介导经验先验。这种选择会阻止闲聊,因此实际上会增强横向竞争表征之间的反应。(Friston,2005 年,第 829 页)
High-level predictions explain away prediction error and tell the error units to “shut up” [while] units encoding the causes of sensory input are selected by lateral interactions, with the error units, that mediate empirical priors. This selection stops the gossiping [hence actually sharpens responses among the laterally competing representations]. (Friston, 2005, p. 829)
因此,在这种特定的结构设置中,“解释”的驱动力与早期皮质反应(不同方面)的锐化和减弱是一致的。13因此, Spratling 在最近对这一问题的正式处理中,14认为,此处任何明显的对比都反映了:
The drive towards “explaining away” is thus consistent, in this specific architectural setting, with both the sharpening and the dampening of (different aspects of) early cortical response.13 Thus Spratling, in a recent formal treatment of this issue,14 suggests that any apparent contrast here reflects:
对模型的误解可能是由于预测编码假设过分强调错误检测节点,而忽视了预测节点在维持刺激的主动表征方面的作用所致。(Spratling,2008 年,第 8 页,重点为本人所加)
A misinterpretation of the model that may have resulted from the strong emphasis the predictive coding hypothesis places on the error-detecting nodes and the corresponding under-emphasis on the role of the prediction nodes in maintaining an active representation of the stimulus. (Spratling, 2008, p. 8, my emphasis)
这种双工架构方案最独特的地方(也是真正打破传统的地方)在于,它将信息的前向流动描述为仅传递误差,将信息后向流动描述为仅传递预测。因此,双工架构在熟悉的(仍有特征检测的级联,具有选择性增强的潜力,并且由远离感官外围的神经群所代表的越来越复杂的特征)和新颖的(感官信息的前向流动现在完全被预测误差的前向流动所取代)之间实现了相当微妙的平衡。
What is most distinctive about this duplex architectural proposal (and where much of the break from tradition really occurs) is that it depicts the forward flow of information as solely conveying error, and the backward flow as solely conveying predictions. The duplex architecture thus achieves a rather delicate balance between the familiar (there is still a cascade of feature-detection, with potential for selective enhancement, and with increasingly complex features represented by neural populations that are more distant from the sensory peripheries) and the novel (the forward flow of sensory information is now entirely replaced by a forward flow of prediction error).
需要强调的是,要实现这种抵消和选择性增强之间的平衡,只有假设存在“两个功能不同的亚群,分别编码感知原因的条件期望和预测误差”(Friston,2005,第 829 页)。当然,功能差异并不一定意味着明显的物理分离。但该文献中的一个常见猜想是,浅层锥体细胞(前向神经解剖连接的主要来源)扮演着误差单元的角色,将预测误差向前传递,而深层锥体细胞扮演着表征单元的角色,将预测(基于复杂的生成模型做出的)向下传递(参见 Friston,2005,2009;Mumford,1992)。无论如何实现(或不实现),某种形式的功能分离都是必需的。这种分离构成了所提架构的核心特征,没有这种特征,就无法将预测编码中的基本元素与对日益复杂的特征检测和自上而下的信号增强的更传统结构的同时支持结合起来。但尽管这是必不可少的,但这是一个要求很高且可能存在问题的要求。
This balancing act between cancelling out and selective enhancement is made possible, it should be stressed, only by positing the existence of “two functionally distinct subpopulations, encoding the conditional expectations of perceptual causes and the prediction error respectively” (Friston, 2005, p. 829). Functional distinctness need not, of course, imply gross physical separation. But a common conjecture in this literature depicts superficial pyramidal cells (a prime source of forward neuro-anatomical connections) as playing the role of error units, passing prediction error forward, while deep pyramidal cells play the role of representation units, passing predictions (made on the basis of a complex generative model) downward (see, e.g., Friston, 2005, 2009; Mumford, 1992). However it may (or may not) be realized, some form of functional separation is required. Such separation constitutes a central feature of the proposed architecture, and one without which it would be unable to combine the radical elements drawn from predictive coding with simultaneous support for the more traditional structure of increasingly complex feature detection and top-down signal enhancement. But essential as it is, this is a demanding and potentially problematic requirement.
在大脑功能的双向分层模型中,面向行动的预测处理对自上而下和自下而上对感知和行动的影响之间的复杂相互作用提供了新的解释,并可能最终解释感知、行动和认知之间的关系。
In the context of bidirectional hierarchical models of brain function, action-oriented predictive processing yields a new account of the complex interplay between top-down and bottom-up influences on perception and action, and perhaps ultimately of the relations between perception, action, and cognition.
正如 Hohwy (2007, p. 320) 所指出的,提供“自上而下”预测的生成模型在这里做了很多传统的“感知”工作,而自下而上的驱动信号实际上提供了一种对其活动的持续反馈(通过适应或不适应向下流动的预测级联)。这一过程以一种特别微妙和有力的方式结合了“自上而下”和“自下而上”的影响,并导致神经元的发展表现出“选择性,这种选择性不是区域固有的,而是取决于处理层次结构中各层之间的相互作用”(Friston,2003,p. 1349)。也就是说,分层预测编码提供了一种处理机制,其中情境敏感性是根本的和普遍的。
As noted by Hohwy (2007, p. 320) the generative model providing the “top-down” predictions is here doing much of the more traditionally “perceptual” work, with the bottom-up driving signals really providing a kind of ongoing feedback on their activity (by fitting, or failing to fit, the cascade of downward-flowing predictions). This procedure combines “top-down” and “bottom-up” influences in an especially delicate and potent fashion, and it leads to the development of neurons that exhibit a “selectivity that is not intrinsic to the area but depends on interactions across levels of a processing hierarchy” (Friston, 2003, p. 1349). Hierarchical predictive coding delivers, that is to say, a processing regime in which context-sensitivity is fundamental and pervasive.
为了看到这一点,我们只需要反映出,根据当前获胜的自上而下的预测所提供的背景信息,输入之后的神经元反应(“诱发反应”)可能会发生相当深刻的变化。这里的关键效应(早期使用“交互式激活”范式的联结主义研究对此很熟悉——例如,见 McClelland 和 Rumelhart,1981;Rumelhart 等,1986b)是,“当一个神经元或群体由自上而下的输入预测时,它会比没有自上而下的输入时更容易驱动”(Friston,2002,第 240 页)。这是因为驱动信号和期望之间的最佳整体契合通常会通过(实际上)推断驱动信号中的噪声并从而将刺激识别为例如字母m (比如,在“母亲”一词的语境中)来找到,即使同样的裸刺激,在脱离语境或在大多数其他语境中呈现,会与字母n更契合。15通常对字母m有反应的单位在这种情况下可能会被类似n的刺激成功驱动。
To see this, we need only reflect that the neuronal responses that follow an input (the “evoked responses”) may be expected to change quite profoundly according to the contextualizing information provided by a current winning top-down prediction. The key effect here (itself familiar enough from earlier connectionist work using the “interactive activation” paradigm—see, e.g., McClelland and Rumelhart, 1981; Rumelhart et al., 1986b—is that, “when a neuron or population is predicted by top-down inputs it will be much easier to drive than when it is not” (Friston, 2002, p. 240). This is because the best overall fit between driving signal and expectations will often be found by (in effect) inferring noise in the driving signal and thus recognizing a stimulus as, for example, the letter m (say, in the context of the word “mother”) even though the same bare stimulus, presented out of context or in most other contexts, would have been a better fit with the letter n.15 A unit normally responsive to the letter m might, under such circumstances, be successfully driven by an n-like stimulus.
这种影响在分层预测处理中普遍存在,对各种形式的神经成像都有深远的影响。例如,在试图识别神经元的反应选择性或神经活动模式时,尽可能多地控制期望变得至关重要。最近,自上而下的期望也对意识识别产生了强烈的影响,这对任何简单的(即与情境无关的)“意识的神经相关性”的概念本身提出了重要的问题。因此,Melloni 等人(2011 年)表明,形成可报告的意识知觉所需的开始时间会根据适当期望的存在与否而有很大差异(约 100 毫秒),并且意识知觉的神经(此处为 EEG)特征也会相应变化——这些作者继续使用分层预测处理装置来解释这一结果。最后,在对自上而下的期望的力量的一次特别引人注目的展示中,Egner 等人。 (2010)表明,梭状回面部区域 (FFA) 中的神经元对面部预期较高的非面部(在本实验中为房屋)刺激的反应与对面部刺激的反应一样强烈。在本研究中:
Such effects are pervasive in hierarchical predictive processing, and have far-reaching implications for various forms of neuroimaging. It becomes essential, for example, to control as much as possible for expectations when seeking to identify the response selectivity of neurons or patterns of neural activity. Strong effects of top-down expectation have also recently been demonstrated for conscious recognition, raising important questions about the very idea of any simple (i.e., context independent) “neural correlates of consciousness.” Thus, Melloni et al. (2011) show that the onset time required to form a reportable conscious percept varies substantially (by around 100 msec) according to the presence or absence of apt expectations, and that the neural (here, EEG) signatures of conscious perception vary accordingly—a result these authors go on to interpret using the apparatus of hierarchical predictive processing. Finally, in a particularly striking demonstration of the power of top-down expectations, Egner et al. (2010) show that neurons in the fusiform face area (FFA) respond every bit as strongly to non-face (in this experiment, house) stimuli under high expectation of faces as they do to face-stimuli. In this study:
FFA 活动表现出刺激特征和期望因素的相互作用,其中 FFA 对面部和房屋刺激的反应之间的差异随着面部期望水平的增加而线性降低,在高面部期望下,面部和房屋诱发信号难以区分。(Egner 等人,2010 年,第 16607 页)
FFA activity displayed an interaction of stimulus feature and expectation factors, where the differentiation between FFA responses to face and house stimuli decreased linearly with increasing levels of face expectation, with face and house evoked signals being indistinguishable under high face expectation. (Egner et al., 2010, p. 16607)
只有在低面部预期的情况下,FFA 反应才会对面部和房屋探测产生最大差异,这表明“[FFA] 反应似乎是由特征预期和惊讶而不是刺激特征本身决定的”(Egner 等人,2010 年,第 16601 页)。简而言之,该建议是,FFA(在许多方面是执行复杂特征检测的区域的典型案例)可能更适合作为面部预期区域而不是面部检测区域:作者将这一结果解释为有利于分层预测处理模型。越来越多的此类结果导致 Muckli 评论道:
Only under conditions of low face expectation was FFA response maximally different for the face and house probes, suggesting that “[FFA] responses appear to be determined by feature expectation and surprise rather than by stimulus features per se” (Egner et al., 2010, p.16601). The suggestion, in short, is that FFA (in many ways the paradigm case of a region performing complex feature detection) might be better treated as a face-expectation region rather than as a face-detection region: a result that the authors interpret as favoring a hierarchical predictive processing model. The growing body of such results leads Muckli to comment that:
感觉刺激可能是皮层的次要任务,而其主要任务是……尽可能准确地预测即将发生的刺激。(Muckli,2010 年,第 137 页)
Sensory stimulation might be the minor task of the cortex, whereas its major task is to … predict upcoming stimulation as precisely as possible. (Muckli, 2010, p. 137)
同样地,Rauss 等人(2011)认为:
In a similar vein, Rauss et al. (2011) suggest that on such accounts:
神经信号与刺激本身的关系,比与刺激与内部目标和预测的一致性的关系更大,后者是根据系统先前的输入计算出来的。(Rauss 等人,2011 年,第 1249 页)
Neural signals are related less to a stimulus per se than to its congruence with internal goals and predictions, calculated on the basis of previous input to the system. (Rauss et al., 2011, p. 1249)
注意力非常适合这种新兴的统一图景,它是一种通过考虑自上而下和自下而上影响的精确度(不确定度)来可变地平衡其强大相互作用的手段。这是通过相应地改变误差单元的增益(使用常见的类比,即“音量”)来实现的。这样做的结果是“控制不同层次先前预期的相对影响”(Friston,2009 年,第 299 页)。在最近的研究中,神经递质多巴胺的影响被认为是编码精度的一种可能的神经机制(参见 Fletcher 和 Frith [2009 年,第 53-54 页],他们建议读者参考有关预测误差和中脑边缘多巴胺系统的研究,例如 Holleman 和 Schultz,1998 年;Waelti 等人,2001 年)。更高的精度(无论如何编码)意味着更少的不确定性,并反映在相关错误单元的更高增益上(参见 Friston,2005,2010;Friston 等人,2009)。如果这是正确的,注意力只是一种手段,通过这种手段,某些错误单元响应被赋予更大的权重,从而更容易推动学习和可塑性,并采取补偿行动。
Attention fits very neatly into this emerging unified picture, as a means of variably balancing the potent interactions between top-down and bottom-up influences by factoring in their precision (degree of uncertainty). This is achieved by altering the gain (the “volume,” to use a common analogy) on the error-units accordingly. The upshot of this is to “control the relative influence of prior expectations at different levels” (Friston, 2009, p. 299). In recent work, effects of the neurotransmitter dopamine are presented as one possible neural mechanism for encoding precision (see Fletcher and Frith [2009, pp. 53–54] who refer the reader to work on prediction error and the mesolimbic dopaminergic system such as Holleman and Schultz, 1998; Waelti et al., 2001). Greater precision (however encoded) means less uncertainty, and is reflected in a higher gain on the relevant error units (see Friston, 2005, 2010; Friston et al., 2009). Attention, if this is correct, is simply one means by which certain error-unit responses are given increased weight, hence becoming more apt to drive learning and plasticity, and to engage compensatory action.
更一般地讲,这意味着自上而下和自下而上影响的精确组合并不是静态或固定的。相反,赋予感官预测误差的权重会根据信号的可靠性(噪声、确定性或不确定性)而变化。这(通常)是个好消息,因为这意味着我们不会(不完全)成为我们期望的奴隶。成功的感知需要大脑尽量减少意外。但代理能够看到非常(代理)令人惊讶的事情,至少在大脑为驱动信号分配高可靠性的情况下是如此。重要的是,这需要其他高级理论(尽管最初是代理意想不到的)胜出,以便通过解释高权重的感官证据来减少意外。在极端和持续的情况下(更多内容请参见第 17.4.2 节),这可能需要逐渐改变底层生成模型本身,Fletcher 和 Frith(2009 年,第 53 页)很好地将其描述为“感知和学习之间的相互作用”。
More generally, this means that the precise mix of top-down and bottom-up influence is not static or fixed. Instead, the weight given to sensory prediction error is varied according to how reliable (how noisy, certain, or uncertain) the signal is taken to be. This is (usually) good news, as it means we are not (not quite) slaves to our expectations. Successful perception requires the brain to minimize surprisal. But the agent is able to see very (agent-) surprising things, at least in conditions where the brain assigns high reliability to the driving signal. Importantly, that requires that other high-level theories, though of an initially agent-unexpected kind, win out so as to reduce surprisal by explaining away the highly weighted sensory evidence. In extreme and persistent cases (more on this in sect. 17.4.2), this may require gradually altering the underlying generative model itself, in what Fletcher and Frith (2009, p. 53) nicely describe as a “reciprocal interaction between perception and learning.”
所有这些都使得感知和认知之间的界限变得模糊,甚至可能消失。感知和信念之间的任何真正区别现在都被自上而下和自下而上的影响的混合所取代,而做出预测的内部模型在时间和空间尺度上也存在差异。顶层(更“认知”)模型16直观地对应于越来越抽象的世界概念,这些概念往往捕捉或依赖于更大时间和空间尺度上的规律。较低层次(更“感知”)的模型捕捉或依赖于与特定类型的感知接触最密切相关的尺度和细节。但正是这些层次之间精确调节、持续、内容丰富的相互作用,通常由某种持续的运动动作介导,现在成为智能、自适应反应的核心。
All this makes the lines between perception and cognition fuzzy, perhaps even vanishing. In place of any real distinction between perception and belief we now get variable differences in the mixture of top-down and bottom-up influence, and differences of temporal and spatial scale in the internal models that are making the predictions. Top-level (more “cognitive”) models16 intuitively correspond to increasingly abstract conceptions of the world, and these tend to capture or depend upon regularities at larger temporal and spatial scales. Lower-level (more “perceptual”) ones capture or depend upon the kinds of scale and detail most strongly associated with specific kinds of perceptual contact. But it is the precision-modulated, constant, content-rich interactions between these levels, often mediated by ongoing motor action of one kind or another, that now emerges as the heart of intelligent, adaptive response.
因此,这些解释似乎在执行神经机制的层面上消除了感知与知识/信念之间表面上清晰的区别。感知世界就是用你所知道的东西来解释多个空间和时间尺度上的感官信号。因此,感知过程与理性(广义上是贝叶斯)的信念固着过程密不可分,并且上下文(自上而下)效应在每个中间处理层面上都能感受到。随着思想、感知和运动的展开,我们发现认知和感知之间没有稳定或明确指定的界面。信念和感知虽然在概念上截然不同,但在机械上却紧密交织在一起。它们是使用相同的计算资源构建的,并且(正如我们将在第 17.4.2 节中看到的那样)相互依存。
These accounts thus appear to dissolve, at the level of the implementing neural machinery, the superficially clean distinction between perception and knowledge/belief. To perceive the world just is to use what you know to explain away the sensory signal across multiple spatial and temporal scales. The process of perception is thus inseparable from rational (broadly Bayesian) processes of belief fixation, and context (top-down) effects are felt at every intermediate level of processing. As thought, sensing, and movement here unfold, we discover no stable or well-specified interface or interfaces between cognition and perception. Believing and perceiving, although conceptually distinct, emerge as deeply mechanically intertwined. They are constructed using the same computational resources, and (as we shall see in sect. 17.4.2) are mutually, reciprocally, entrenching.
面向行动的(分层)预测处理模型有望将认知、感知、行动和注意力整合到一个共同的框架中。该框架建议将分层生成模型引起的概率密度分布作为我们表示世界的基本手段,将预测误差最小化作为学习、行动选择、识别和推理的驱动力。这样的框架为广泛的特定现象提供了新的见解,包括非经典受体场效应、双稳态感知、线索整合和神经元反应的普遍上下文敏感性。它与认知神经科学的工作有着丰富而富有启发性的联系,同时在计算建模和贝叶斯理论方面拥有坚实的基础。因此,它提供了可以说是第一个真正系统的桥梁17,将我们理解思维和推理的三个最有希望的工具联系起来:认知神经科学、计算建模和处理证据和不确定性的概率贝叶斯方法。
Action-oriented (hierarchical) predictive processing models promise to bring cognition, perception, action, and attention together within a common framework. This framework suggests probability-density distributions induced by hierarchical generative models as our basic means of representing the world, and prediction-error minimization as the driving force behind learning, action-selection, recognition, and inference. Such a framework offers new insights into a wide range of specific phenomena including non-classical receptive field effects, bi-stable perception, cue integration, and the pervasive context-sensitivity of neuronal response. It makes rich and illuminating contact with work in cognitive neuroscience while boasting a firm foundation in computational modeling and Bayesian theory. It thus offers what is arguably the first truly systematic bridge17 linking three of our most promising tools for understanding mind and reason: cognitive neuroscience, computational modelling, and probabilistic Bayesian approaches to dealing with evidence and uncertainty.
根据芒福德的说法:
According to Mumford:
在最终的稳定状态下,深层锥体 [向下传递预测] 会发出信号,完美预测每个下层区域所感知的内容,直至预期的噪声水平,而浅层锥体 [向上传递预测误差] 根本不会发射。(Mumford,1992 年,第 247 页)
In the ultimate stable state, the deep pyramidals [conveying predictions downwards] would send a signal that perfectly predicts what each lower area is sensing, up to expected levels of noise, and the superficial pyramidals [conveying prediction errors upwards] wouldn’t fire at all. (Mumford, 1992, p. 247)
芒福德随后在一个有趣的脚注中补充道:
In an intriguing footnote, Mumford then adds:
从某种意义上说,这正是大脑皮层试图达到的状态:完美地预测世界,就像李泰兴向我提到的东方涅槃一样,当没有任何事情让你感到惊讶,而新的刺激只在你的意识中引起最轻微的波动时。(同上,第 247 页,注释 5)
In some sense, this is the state that the cortex is trying to achieve: perfect prediction of the world, like the oriental Nirvana, as Tai-Sing Lee suggested to me, when nothing surprises you and new stimuli cause the merest ripple in your consciousness. (op. cit., p. 247, Note 5)
这句话凸显了一种普遍的担忧,这种担忧有时会与大规模的说法有关,即皮质处理的根本目的是最小化预测误差,从而抑制信息的前向流动,实现芒福德生动地描述的“终极稳定状态”。可以这样说:
This remark highlights a very general worry that is sometimes raised in connection with the large-scale claim that cortical processing fundamentally aims to minimize prediction error, thus quashing the forward flow of information and achieving what Mumford evocatively describes as the “ultimate stable state.” It can be put like this:
通过控制感知、行动和注意力来最小化预测误差的神经命令如何能适应动物不会简单地寻找一个舒适的黑暗房间并待在里面这一明显事实?呆在黑暗的房间里难道不是可以让我们轻松且近乎完美地预测我们自己正在展开的神经状态吗?这个故事难道没有遗漏很多对适应性成功真正重要的东西:比如无聊、好奇心、玩耍、探索、觅食和狩猎的刺激?
How can a neural imperative to minimize prediction error by enslaving perception, action, and attention accommodate the obvious fact that animals don’t simply seek a nice dark room and stay in it? Surely staying still inside a darkened room would afford easy and nigh-perfect prediction of our own unfolding neural states? Doesn’t the story thus leave out much that really matters for adaptive success: things like boredom, curiosity, play, exploration, foraging, and the thrill of the hunt?
简单的回答(就目前而言是正确的)是,像我们这样的动物生活和觅食在一个不断变化和充满挑战的世界中,因此“期望”部署相当复杂的“流动”策略(Friston,2010;Friston 等,2009)以保持在我们物种特定的生存能力范围内。变化、运动、探索和搜索本身对于生活在资源分布不均、新威胁和机遇不断出现的世界中的生物来说很有价值。这意味着变化、运动、探索和搜索本身是可以预测的——并准备相应地奴役行动和感知。解读这个想法的一种方法是研究先验的可能作用,这些先验会诱导运动通过状态空间,直到找到一个可接受的、但可能是暂时的或其他不稳定的停止点(吸引子)。正是在这个意义上,弗里斯顿(2011a,第 113 页)评论说:“一些物种具有先验预期,它们将参与探索或社交游戏。”
The simple response (correct, as far as it goes) is that animals like us live and forage in a changing and challenging world, and hence “expect” to deploy quite complex “itinerant” strategies (Friston, 2010; Friston et al., 2009) to stay within our species-specific window of viability. Change, motion, exploration, and search are themselves valuable for creatures living in worlds where resources are unevenly spread and new threats and opportunities continuously arise. This means that change, motion, exploration, and search themselves become predicted—and poised to enslave action and perception accordingly. One way to unpack this idea would be to look at the possible role of priors that induce motion through a state space until an acceptable, though possibly temporary or other wise unstable, stopping point (an attractor) is found. In precisely this vein Friston (2011a, p. 113) comments that “some species are equipped with prior expectations that they will engage in exploratory or social play.”
这种先前预期空间的整体形状因物种而异,也可能因学习和经验而变化。因此,在关于预测误差最小化的大规模故事中,没有任何内容规定有时被称为“探索”与“开发”之间的任何一般或固定平衡(有关此问题的进一步讨论,请参阅 Friston 和 Stephan,2007 年,第 435-36 页)。相反,不同的生物相当于(Friston,2011a)其特定需求和环境利基的不同“具体模型”,并且它们的期望和预测是在这样的背景下形成、编码、加权和计算的。这既是好消息,也是坏消息。这是好消息,因为这意味着提供的故事确实可以容纳我们看到的所有形式的行为(探索、寻求刺激等)。但它是糟糕的(或至少是有限的),因为这意味着这些解释本身并没有告诉我们太多关于这些关键特征的信息:这些特征仍然以各种相当基本的方式制约和限制着生物体的反应。
The whole shape of this space of prior expectations is specific to different species and may also vary as a result of learning and experience. Hence, nothing in the large-scale story about prediction error minimization dictates any general or fixed balance between what is sometimes glossed as “exploration” versus “exploitation” (for some further discussion of this issue, see Friston and Stephan, 2007, pp. 435–36). Instead, different organisms amount (Friston, 2011a) to different “embodied models” of their specific needs and environmental niches, and their expectations and predictions are formed, encoded, weighted, and computed against such backdrops. This is both good news and bad news. It’s good because it means the stories on offer can indeed accommodate all the forms of behavior (exploration, thrill-seeking, etc.) we see. But it’s bad (or at least, limiting) because it means that the accounts don’t in themselves tell us much at all about these key features: features which nonetheless condition and constrain an organism’s responses in a variety of quite fundamental ways.
当然,从某种意义上说,这显然没有问题。只要简单看一下生物(甚至哺乳动物)生命形式的惊人多样性,我们就会发现,无论塑造生命和思维的基本原则是什么,它们确实与形态学、神经学和行为学的惊人结果相兼容。但从另一个角度看,这仍然令人失望。如果我们想要了解的是人类思维的具体功能结构,那么这些非常普遍的预测误差最小化原则与我们人类已经接受的适应性需求的具体解决方案之间的距离仍然令人望而生畏。举一个简单的例子,请注意,预测处理账户留下了有关人类神经表征的性质和格式的各种深刻而重要的问题。我们看到,所提供的表示完全被限制为概率(和基于生成模型)。但这与使用概率生成模式使用各种不同的方案和表面形式对信息进行编码是兼容的。考虑一下背侧和腹侧视觉流对视觉场景属性进行编码的方式中众所周知的差异。背侧流 (Milner and Goodale, 2006) 试图部署表示和处理模式,这些模式在某种程度上与腹侧流中编码和计算的模式截然不同。即使在更基本的层面上,视觉和运动皮层确实存在一种共同的计算策略,情况也是如此。
In one way, of course, this is clearly unproblematic. The briefest glance at the staggering variety of biological (even mammalian) life forms tells us that whatever fundamental principles are sculpting life and mind, they are indeed compatible with an amazing swathe of morphological, neurological, and ethological outcomes. But in another way it can still seem disappointing. If what we want to understand is the specific functional architecture of the human mind, the distance between these very general principles of prediction-error minimization and the specific solutions to adaptive needs that we humans have embraced remains daunting. As a simple example, notice that the predictive processing account leaves wide open a variety of deep and important questions concerning the nature and format of human neural representation. The representations on offer are, we saw, constrained to be probabilistic (and generative model based) through and through. But that is compatible with the use of the probabilistic-generative mode to encode information using a wide variety of different schemes and surface forms. Consider the well-documented differences in the way the dorsal and ventral visual streams code for attributes of the visual scene. The dorsal stream (Milner and Goodale, 2006) looks to deploy modes of representation and processing that are at some level of interest quite distinct from those coded and computed in the ventral stream. And this will be true even if there is indeed, at some more fundamental level, a common computational strategy at work throughout the visual and the motor cortex.
因此,发现各种内部表征格式的性质代表了揭示人类认知结构完整形状的更大项目。正如 Eliasmith (2007) 所说,这个更大的项目似乎需要各种见解的复杂组合,其中一些见解来自理论(数学、统计和计算)模型的“自上而下”,另一些见解来自神经科学工作的“自下而上”,揭示了大脑的实际资源,这些资源是由我们独特的进化(以及——我们接下来会看到的——社会文化)轨迹塑造的。
Discovering the nature of various inner representational formats is thus representative of the larger project of uncovering the full shape of the human cognitive architecture. It seems likely that, as argued by Eliasmith (2007), this larger project will demand a complex combination of insights, some coming “top-down” from theoretical (mathematical, statistical, and computational) models, and others coming “bottom-up” from neuroscientific work that uncovers the brain’s actual resources as sculpted by our unique evolutionary (and—as we’ll next see—sociocultural) trajectory.
早在 20 世纪 70 年代末和 80 年代初(传统人工智能 [AI] 的鼎盛时期),人们普遍认为两种性格类型反映在人类思维理论中。Roger Schank 和 Robert Abelson 将这些类型称为“整洁型”和“邋遢型”。18整洁型相信智能背后有一些非常普遍的、有利于真理的原则。邋遢型认为智能来自各种各样的技巧:一个摇摇欲坠的塔,里面装满了粗糙而现成的问题解决方案,通常使用各种快速补丁和局部策略组装而成,并贪婪地搜寻其他历史性问题和需求解决方案的残渣和残余。众所周知,这可能导致针对生态新问题(例如计划经济、建设铁路网络和维护互联网)的解决方案变得邋遢、不可靠,或者有时只是不必要的复杂。这种历史上依赖路径的解决方案有时被称为“临时解决方案”——例如,参见 Clark (1987) 和 Marcus (2008)。Neats 偏爱逻辑和可证明正确的解决方案,而 Scruffies 则偏爱在通常的生态环境中针对特定问题运行良好、速度足够快的解决方案。同样的分歧也出现在联结主义和经典人工智能之间的早期辩论中(例如,参见 Sloman,1990),联结主义者经常被指责开发的系统的操作原理(经过一些复杂的输入输出对的训练)不透明且“混乱”。这种冲突再次出现在最近的辩论中(Griffiths 等人,2010;McClelland 等人,2010),辩论双方是那些支持“结构化概率方法”的人和那些支持“突现主义”方法的人(这些方法本质上是并行分布式处理类型的联结主义方法)。19
Back in the late 1970s and early 1980s (the heyday of classical Artificial Intelligence [AI]) there was a widely held view that two personality types were reflected in theorizing about the human mind. These types were dubbed, by Roger Schank and Robert Abelson, the “neats” versus the “scruffies.”18 Neats believed in a few very general, truth-conducive principles underlying intelligence. Scruffies saw intelligence as arising from a varied bag of tricks: a rickety tower of rough-and-ready solutions to problems, often assembled using various quick patches and local ploys, and greedily scavenging the scraps and remnants of solutions to other, historically prior, problems and needs. Famously, this can lead to scruffy, unreliable, or sometimes merely unnecessarily complex solutions to ecologically novel problems such as planning economies, building railway networks, and maintaining the Internet. Such historically path-dependent solutions were sometimes called “kluges”—see, for example, Clark (1987) and Marcus (2008). Neats favored logic and provably correct solutions, while scruffies favored whatever worked reasonably well, fast enough, in the usual ecological setting, for some given problem. The same kind of division emerged in early debates between connectionist and classical AI (see, e.g., Sloman, 1990), with connectionists often accused of developing systems whose operating principles (after training on some complex set of input-output pairs) was opaque and “messy.” The conflict reappears in more recent debates (Griffiths et al., 2010; McClelland et al., 2010) between those favoring “structured probabilistic approaches” and those favoring “emergentist” approaches (where these are essentially connectionist approaches of the parallel distributed processing variety).19
我个人的同情心(Clark,1989,1997)总是更多地偏向邋遢者。在我看来(Clark,1987),进化的智能必然会涉及一种难以驾驭的杂乱无章的诡计和计谋,具有显著的路径依赖性,不重视内部一致性,并且通常以牺牲更慢、更费力、甚至更有利于真理的思维和推理模式为代价来获得快速有效的情境反应。从这个角度来看,“贝叶斯大脑”乍一看似乎为进化的生物智能提供了一个不太可能的模型。通过分层预测处理实现,它提出了一种单一的基本学习算法(基于生成模型、预测编码和预测误差最小化),该算法接近贝叶斯信念更新的理性理想。假设这种模型被证明是正确的。这是否意味着整洁者最终战胜了邋遢者?我认为不会,其原因进一步阐明了上一节提出的关于范围和限制的问题。
My own sympathies (Clark, 1989, 1997) have always lain more on the side of the scruffies. Evolved intelligence, it seemed to me (Clark, 1987), was bound to involve a kind of unruly motley of tricks and ploys, with significant path-dependence, no premium set on internal consistency, and fast effective situated response usually favored at the expense of slower, more effortful, even if more truth-conducive modes of thought and reasoning. Seen through this lens, the “Bayesian brain” seems, at first glance, to offer an unlikely model for evolved biological intelligence. Implemented by hierarchical predictive processing, it posits a single, fundamental kind of learning algorithm (based on generative models, predictive coding, and prediction-error minimization) that approximates the rational ideal of Bayesian belief update. Suppose such a model proves correct. Would this amount to the final triumph of the neats over the scruffies? I suspect it would not, and for reasons that shed additional light upon the questions about scope and limits raised in the previous section.
我们倾向于“整洁的人”,因为越来越多的证据(见第 17.2.2 节)表明,对于涉及感知和运动控制的许多基本问题,人类主体(以及其他动物)确实能够近似最佳贝叶斯观察者和参与者的反应和选择。尽管如此,这些模型与它们在人类或其他动物身上的具体实施仍有相当大的差距。正是在这里,整洁的人对邋遢的人的明显胜利可能会受到质疑。因为贝叶斯大脑的故事最多告诉我们大脑(或者更好地说,大脑在行动中)能够计算什么。它还表明了大脑必须部署的表示和计算形式的很多信息:例如,它表明大脑必须部署感官信息的概率表示;它必须考虑到自身感官信号的不确定性,估计环境本身的“波动性”(变化频率)(Yu,2007),等等。但对于发生这一切的大规模认知架构的确切形状,仍然留下了大量争论和探索的空间。
Favoring the “neats,” we have encountered a growing body of evidence (see section 17.2.2) showing that for many basic problems involving perception and motor control, human agents (as well as other animals) do indeed manage to approximate the responses and choices of optimal Bayesian observers and actors. Nonetheless, a considerable distance still separates such models from the details of their implementation in humans or other animals. It is here that the apparent triumph of the neats over the scruffies may be called into question. For the Bayesian brain story tells us, at most, what the brain (or better, the brain in action) manages to compute. It also suggests a good deal about the forms of representation and computation that the brain must deploy: For example, it suggests that the brain must deploy a probabilistic representation of sensory information; that it must take into account uncertainty in its own sensory signals, estimate the “volatility” (frequency of change) of the environment itself (Yu, 2007), and so on. But that still leaves plenty of room for debate and discovery as regards the precise shape of the large-scale cognitive architecture within which all this occurs.
最后,这些叙述与人类精神生活有何关系?这当然是最难回答的问题,但也可能是最为重要的问题。我无法指望在目前的讨论中充分解决这个问题,但一些初步的评论可能有助于为后续讨论留出空间。
How, finally, do the accounts on offer relate to a human mental life? This, of course, is the hardest—though potentially the most important—question of all. I cannot hope to adequately address it in the present treatment, but a few preliminary remarks may help to structure a space for subsequent discussion.
这些故事在多大程度上捕捉或解释了我们可能认为的个人(或主体层面)认知的事实——即表征日常有意识的思考和推理的思想、推理和观念的流动?第一印象(但幸运的是,这只是表面印象)是,它们远远不能阐明个人层面的经验。例如,惊奇(在给定的世界模型下,某种感官状态的不合理性)和主体层面的惊奇之间似乎存在很大的脱节。这从一个简单的事实中可以看出:总体而言,对大脑来说最能减少惊奇(从而减少预测误差)的感知,对于我这个主体来说,很可能是一些非常令人惊讶和意想不到的情况——例如,想象一下,一只巨大而悲伤的大象突然被一位专业魔术师优雅地偷偷带到舞台上。
To what extent, if any, do these stories capture or explain facts about what we might think of as personal (or agent-level) cognition—the flow of thoughts, reasons, and ideas that characterize daily conscious thought and reason? A first (but fortunately merely superficial) impression is that they fall far short of illuminating personal-level experience. For example, there seems to be a large disconnect between surprisal (the implausibility of some sensory state given a model of the world) and agent-level surprise. This is evident from the simple fact that the percept that, overall, best minimizes surprisal (hence minimizes prediction errors) “for” the brain may well be, for me the agent, some highly surprising and unexpected state of affairs—imagine, for example, the sudden unveiling of a large and doleful elephant elegantly smuggled onto the stage by a professional magician.
然而,这两种观点很容易调和。大而悲伤的大象最好被理解为不太可能,但并不令人惊讶(至少在相关意义上——回想一下第 17.3.1 节)。相反,在当前驱动输入和指定精度(反映大脑对感官信号的信心程度)组合的情况下,该感知最能尊重系统对世界的了解和期望。在正确的驱动信号和足够高的精度分配的情况下,最初代理意料之外的顶层理论仍然可以胜出,从而解释大量传入的感官证据。考虑到输入、先验和当前对感官预测误差的权重,悲伤大象的景象可能会成为最不令人惊讶(最不“惊讶”!)的可用感知。尽管如此,系统先验并没有提前使该感知非常可能,因此(也许)惊讶的感觉对代理有价值。
The two perspectives are, however, easily reconciled. The large and doleful elephant is best understood as improbable but not (at least not in the relevant sense—recall sect. 17.3.1) surprising. Instead, that percept is the one that best respects what the system knows and expects about the world, given the current combination of driving inputs and assigned precision (reflecting the brain’s degree of confidence in the sensory signal). Given the right driving signal and a high enough assignment of precision, top-level theories of an initially agent-unexpected kind can still win out so as to explain away that highly weighted tide of incoming sensory evidence. The sight of the doleful elephant may then emerge as the least surprising (least “surprisal-ing”!) percept available, given the inputs, the priors, and the current weighting on sensory prediction error. Nonetheless, systemic priors did not render that percept very likely in advance, hence (perhaps) the value to the agent of the feeling of surprise.
有人可能会说,仅仅容纳人类个人层面的各种体验是一回事,而真正阐明这些体验又是另一回事。然而,这种积极影响至少即将出现。我们在最近在预测处理(分层预测编码)框架内进行的大量研究中看到了这种潜力,这些研究解决了精神分裂症中的妄想和幻觉(Corlett 等人,2009a;Fletcher 和 Frith,2009)。
It might be suggested that merely accommodating the range of human personal-level experiences is one thing, while truly illuminating them is another. Such positive impact is, however, at least on the horizon. We glimpse the potential in an impressive body of recent work conducted within the predictive processing (hierarchical predictive coding) framework addressing delusions and hallucination in schizophrenia (Corlett et al., 2009a; Fletcher and Frith, 2009).
回想一下上一节中描述的意外发现大象的场景。此时,系统已经命令一个恰当的模型,该模型能够“解释”驱动输入、期望和精度(预测误差加权)的特定组合,这些组合指定了悲伤的灰色存在。但情况并非总是如此。有时,处理持续的、高权重的感觉预测误差可能需要逐渐形成全新的生成模型(就像在正常学习中一样)。正如 Fletcher 和 Frith (2009) 所建议的那样,这可能是更好地理解精神分裂症幻觉和妄想(两种“阳性症状”)起源的关键。这两种症状通常被认为涉及两种机制,因此有两种故障,一种是“感知”(导致幻觉),另一种是“信念”(允许这些异常感知影响顶层信念)。强调感知异常本身通常不会导致妄想症患者产生奇怪和奇特的信念情结似乎是正确的(例如,参见 Coltheart,2007)。但是,我们必须因此认为感知和信念成分实际上是独立的吗?
Recall the unexpected sighting of the elephant described in the previous section. Here, the system already commanded an apt model able to “explain away” the particular combination of driving inputs, expectations, and precision (weighting on prediction error) that specified the doleful, gray presence. But such is not always the case. Sometimes, dealing with ongoing, highly-weighted sensory prediction error may require brand new generative models gradually to be formed (just as in normal learning). This might hold the key, as Fletcher and Frith (2009) suggest, to a better understanding of the origins of hallucinations and delusion (the two “positive symptoms”) in schizophrenia. These two symptoms are often thought to involve two mechanisms and hence two breakdowns, one in “perception” (leading to the hallucinations) and one in “belief” (allowing these abnormal perceptions to impact top-level belief). It seems correct (see, e.g., Coltheart, 2007) to stress that perceptual anomolies alone will not typically lead to the strange and exotic belief complexes found in delusional subjects. But must we therefore think of the perceptual and doxastic components as effectively independent?
如果感知和信念形成(如本文所述)都涉及尝试将展开的感官信号与自上而下的预测相匹配,则可能存在联系。重要的是,这种尝试匹配的影响是精确介导的,因为残差预测误差的系统效应会根据大脑对信号的信心而变化(第 17.2.2 节)。考虑到这一点,Fletcher 和 Frith(2009)探讨了分层贝叶斯系统受到干扰可能造成的后果,即错误地生成预测误差信号,更重要的是,预测误差信号被高度加权(因此赋予了过度的显著性以推动学习)。
A possible link emerges if perception and belief-formation, as the present story suggests, both involve the attempt to match unfolding sensory signals with top-down predictions. Importantly, the impact of such attempted matching is precision-mediated in that the systemic effects of residual prediction error vary according to the brain’s confidence in the signal (sect. 17.2.2). With this in mind, Fletcher and Frith (2009) canvass the possible consequences of disturbances to a hierarchical Bayesian system such that prediction error signals are falsely generated and—more important—highly weighted (hence accorded undue salience for driving learning).
有许多潜在机制,其复杂的相互作用一旦在预测误差最小化的总体框架内处理,可能会合谋产生此类干扰。突出的竞争者包括慢速神经调节剂的作用,如多巴胺、血清素和乙酰胆碱(Corlett 等人,2009a,2010)。此外,Friston(2010,第 132 页)推测,神经元之间的快速同步活动区域也可能在增加同步人群的预测误差增益方面发挥作用。20无论如何实施,关键思想是,要了解精神分裂症的阳性症状,就需要了解预测误差的产生和权重中的干扰。建议(Corlett 等人,2009a、b;Fletcher 和 Frith,2009)是,该复杂经济体中的故障(可能从根本上根源于多巴胺能功能异常)会产生一波又一波持续且权重很高的“假错误”,然后这些错误会一直传播到整个层次结构,在严重的情况下(通过随后的神经可塑性浪潮)迫使我们对世界的模型进行极其深刻的修改。于是,不可能发生的事情(心灵感应、阴谋、迫害等等)就变得最不令人惊讶了,而且,由于感知本身受到自上而下的先前预期的影响,错误信息的连锁反应会回流到下方,让错误的感知和怪异的信念固化成一个连贯且相互支持的循环。
There are a number of potential mechanisms whose complex interactions, once treated within the overarching framework of prediction error minimization, might conspire to produce such disturbances. Prominent contenders include the action of slow neuromodulators such as dopamine, serotonin, and acetylcholine (Corlett et al., 2009a, 2010). In addition, Friston (2010, p. 132) speculates that fast, synchronized activity between neural areas may also play a role in increasing the gain on prediction error within the synchronized populations.20 The key idea, however implemented, is that understanding the positive symptoms of schizophrenia requires understanding disturbances in the generation and weighting of prediction error. The suggestion (Corlett et al., 2009a,b; Fletcher and Frith, 2009) is that malfunctions within that complex economy (perhaps fundamentally rooted in abnormal dopaminergic functioning) yield wave upon wave of persistent and highly weighted “false errors” that then propagate all the way up the hierarchy forcing, in severe cases (via the ensuing waves of neural plasticity) extremely deep revisions in our model of the world. The improbable (telepathy, conspiracy, persecution, etc.) then becomes the least surprising, and—because perception is itself conditioned by the top-down flow of prior expectations—the cascade of misinformation reaches back down, allowing false perceptions and bizarre beliefs to solidify into a coherent and mutually supportive cycle.
这个过程是自我巩固的。随着新的生成模型的出现,它们的影响会向下流动,这样输入的数据就会被新的(但现在严重误导的)先验所塑造,以“符合预期”(Fletcher 和 Frith,2009 年,第 348 页)。因此,错误的感知和奇怪的信念形成了一个认识论上孤立的自我确认循环。21这就是关于感知和认知的无缝故事(第 17.2 节)的阴暗面。预测处理模型将感知、信念、学习和情感融合成一个单一的总体经济体系(通常是富有成效的):在这个经济体系中,多巴胺和其他神经递质控制预测误差本身的“精度”(权重,因此对推理和学习有影响)。但当事情出错时,错误的推理就会螺旋式上升并自我反馈。妄想和幻觉于是变得根深蒂固,既是共同决定的,又是共同决定的。
Such a process is self-entrenching. As new generative models take hold, their influence flows back down so that incoming data is sculpted by the new (but now badly misinformed) priors so as to “conform to expectancies” (Fletcher and Frith, 2009, p. 348). False perceptions and bizarre beliefs thus form an epistemically insulated self-confirming cycle.21 This, then, is the dark side of the seamless story (sect. 17.2) about perception and cognition. The predictive processing model merges—usually productively—perception, belief, learning, and affect into a single overarching economy: one within which dopamine and other neurotransmitters control the “precision” (the weighting, hence the impact on inference and on learning) of prediction error itself. But when things go wrong, false inferences spiral and feed back upon themselves. Delusion and hallucination then become entrenched, being both co-determined and co-determining.
可以使用相同的广义贝叶斯框架(Corlett 等人,2009a)来帮助理解不同的药物在给健康志愿者服用时如何暂时模仿各种形式的精神病。这里的关键特征也是预测编码框架能够解释学习和经验的复杂变化,这取决于驱动感官信号的结合方式(药理学上可修改的),这要归功于精确加权的预测误差,以及先前的预期和(因此)持续的预测。例如,氯胺酮的拟精神病效应据说可以通过对预测误差信号的干扰(可能是由 AMPA 上调引起的)和预测流(可能是通过 NMDA 干扰)来解释。这会导致持续的预测错误,以及——至关重要的——对相关事件的重要性或显著性的夸大感,这反过来又会推动短暂的妄想信念的形成(参见 Corlett 等人,2009a,第 6-7 页;另见 Gerrans,2007 年的讨论)。作者继续阐述了其他药物(如 LSD 和其他血清素致幻剂、大麻和多巴胺激动剂,如苯丙胺)的不同拟精神病效应,这些效应反映了分层预测处理框架内其他可能的干扰类型。22
The same broadly Bayesian framework can be used (Corlett et al., 2009a) to help make sense of the ways in which different drugs, when given to healthy volunteers, can temporarily mimic various forms of psychosis. Here, too, the key feature is the ability of the predictive coding framework to account for complex alterations in both learning and experience contingent upon the (pharmacologically modifiable) way driving sensory signals are meshed, courtesy of precision-weighted prediction errors, with prior expectancies and (hence) ongoing prediction. The psychotomimetic effects of ketamine, for example, are said to be explicable in terms of a disturbance to the prediction error signal (perhaps caused by AMPA upregulation) and the flow of prediction (perhaps via NMDA interference). This leads to a persistent prediction error and—crucially—an inflated sense of the importance or salience of the associated events, which in turn drives the formation of short-lived delusion-like beliefs (see Corlett et al., 2009a, pp. 6–7; also, discussion in Gerrans, 2007). The authors go on to offer accounts of the varying psychotomimetic effects of other drugs (such as LSD and other serotonergic hallucinogens, cannabis, and dopamine agonists such as amphetamine) as reflecting other possible varieties of disturbance within a hierarchical predictive processing framework.22
这些模型暗示人类经验的性质和构造的深层事实的另一个领域涉及感知的特征以及感知与图像/视觉想象之间的关系。预测驱动的处理方案在上述类型的分层机制内运行,学习概率生成模型,其中每个神经群体都以下面的神经群体所显示的活动模式为目标。这里的关键——正如我们在第 17.1.1 节中看到的那样,是什么让这些模型具有生成性——是它们可以“自上而下”地用于预测下一级的激活模式。实际的结果是,这些系统只是学习感知的一部分,通过将较低的群体驱使到预测的模式中,发展出自上而下自我生成23 种类似感知的状态的能力。
Another area in which these models are suggestive of deep facts about the nature and construction of human experience concerns the character of perception and the relations between perception and imagery/visual imagination. Prediction-driven processing schemes, operating within hierarchical regimes of the kind described above, learn probabilistic generative models in which each neural population targets the activity patterns displayed by the neural population below. What is crucial here—what makes such models generative as we saw in section 17.1.1—is that they can be used “top-down” to predict activation patterns in the level below. The practical upshot is that such systems, simply as part and parcel of learning to perceive, develop the ability to self-generate23 perception-like states from the top down, by driving the lower populations into the predicted patterns.
因此,感知与自我生成心理意象形式的可能性之间存在着相当深刻的联系(Kosslyn 等人,1995 年;Reddy 等人,2010 年)。能够学会视觉感知猫(比如)的概率生成模型系统,事实上,就是能够部署自上而下的级联来产生许多在看到真正的猫时会发生的活动模式的系统。因此,这样的系统显示出(有关此问题的更多讨论,请参阅 Clark (2015a) 感知和想象的深刻二元性。24 Grush (2004) 在“表征的仿真理论”中强调了同样的二元性,这是一种丰富而详细的处理,与预测处理故事有许多关键特征。25
There thus emerges a rather deep connection between perception and the potential for self-generated forms of mental imagery (Kosslyn et al., 1995; Reddy et al., 2010). Probabilistic generative model based systems that can learn to visually perceive a cat (say) are, ipso facto, systems that can deploy a top-down cascade to bring about many of the activity patterns that would ensue in the visual presence of an actual cat. Such systems thus display (for more discussion of this issue, see Clark (2015a) a deep duality of perception and imagination.24 The same duality is highlighted by Grush (2004) in the “emulator theory of representation,” a rich and detailed treatment that shares a number of key features with the predictive processing story.25
我们被要求考虑的故事到底有多激进?它是否最好被视为主流计算理论的替代品,后者假设了一系列越来越复杂的特征检测(可能带有一些自上而下的偏见),还是仅仅是对它们的补充:其主要优点在于能够突出预测误差在推动学习和响应方面的关键作用?我认为我们目前还不能权威地回答这个问题。但我所描绘的图景表明了一个中间结论,至少在表示和处理的核心问题方面是如此。
Just how radical is the story we have been asked to consider? Is it best seen as an alternative to mainstream computational accounts that posit a cascade of increasingly complex feature detection (perhaps with some top-down biasing), or is it merely a supplement to them: one whose main virtue lies in its ability to highlight the crucial role of prediction error in driving learning and response? I do not think we are yet in a position to answer this question with any authority. But the picture I have painted suggests an intermediate verdict, at least with respect to the central issues concerning representation and processing.
关于表征,这些故事至少在两个方面具有潜在的激进性。首先,它们表明概率生成模型是感觉分类和运动反应的基础。其次,它们表明感觉数据的前向流动被预测误差的前向流动所取代。然而,后一个方面可能会使这些模型看起来比实际上更加激进:回想一下,预测误差的前向流动在这里与预测的向下流动相结合,并且在处理模型的每个阶段都假设(正如我们在第 17.2.1 节中详细看到的那样)功能上不同的“误差单元”和“表征单元”。表示单元向下传递预测确实会在远离原始感官输入的处理层面上编码越来越复杂和抽象的特征(在越来越大的空间和时间尺度上捕捉背景和规律)。从非常真实的意义上讲,越来越复杂的特征检测的大部分标准架构都保留在这里。不同之处在于信息流的形状,以及(相关地)分配给预测误差计算和传播的关键作用。
Concerning representation, the stories on offer are potentially radical in at least two respects. First, they suggest that probabilistic generative models underlie both sensory classification and motor response. And second, they suggest that the forward flow of sensory data is replaced by the forward flow of prediction error. This latter aspect can, however, make the models seem even more radical than they actually are: Recall that the forward flow of prediction error is here combined with a downward flow of predictions, and at every stage of processing the models posit (as we saw in some detail in sect. 17.2.1) functionally distinct “error units” and “representation units.” The representation units that communicate predictions downward do indeed encode increasingly complex and more abstract features (capturing context and regularities at ever-larger spatial and temporal scales) in the processing levels furthest removed from the raw sensory input. In a very real sense then, much of the standard architecture of increasingly complex feature detection is here retained. What differs is the shape of the flow of information, and (relatedly) the pivotal role assigned to the computation and propagation of prediction error.
一个相关的问题是,新框架在多大程度上重现了有关不同皮质区域专业化的传统见解。这是一个大问题,其完全解决仍超出了本讨论的范围。但总的来说,这些模型的层次形式表明了专业化和整合的微妙结合。不同的层次学习和部署不同的预测集,对应于不同的知识体系,针对较低的层次(专业化),但系统的稳定方式主要取决于预测误差的总体流动和权重,其中这种流动本身根据当前环境和不同类型信息的可靠性和相关性而变化(整合)。26
A related issue concerns the extent to which the new framework reproduces traditional insights concerning the specialization of different cortical areas. This is a large question whose full resolution remains beyond the scope of the present discussion. But in general, the hierarchical form of these models suggests a delicate combination of specialization and integration. Different levels learn and deploy different sets of predictions, corresponding to different bodies of knowledge, aimed at the level below (specialization) but the system settles in a way largely determined by the overall flow and weighting of prediction error, where this flow is itself varied according to current context and the reliability and relevance of different types of information (integration).26
第二个潜在激进主义的来源是这样的提议(第 17.1.3 节),即在将模型扩展到包括行动(“面向行动的预测处理”)时,我们可能同时消除诉诸目标和奖励的需要,而用更为严谨的预测结构取而代之。在这方面,我们读到:
A second source of potential radicalism lies with the suggestion (sect. 17.1.3) that, in extending the models to include action (“action-oriented predictive processing”), we might simultaneously do away with the need to appeal to goals and rewards, replacing them with the more austere construct of predictions. In this vein, we read that:
至关重要的是,主动推理不会引起任何“期望的结果”。它仅依赖于依赖经验的学习和推理:经验会引发先前的预期,从而指导感知推理和行动。(Friston 等人,2011 年,第 157 页)
Crucially, active inference does not invoke any “desired consequences.” It rests only on experience-dependent learning and inference: Experience induces prior expectations, which guide perceptual inference and action. (Friston et al., 2011, p. 157)
在这个沙漠景观视野中,既没有目标,也没有奖励信号。相反,只有(既有学习到的,也有物种特定的)期望,跨越许多空间和时间尺度,直接奴役感知和行动。换句话说,成本函数被关于行动及其感官(尤其是本体感受)后果的期望所取代。在这里,我仍然不相信。因为即使这种简朴的描述确实可能(对于一些关键问题,请参阅 Gershman 和 Daw,2012),这也不能立即证明我们声称它因此构成了理解认知经济丰富组织的更好工具。要看到这一点,我们只需要反思一下,这一切都“只是”原子、分子和物理定律,但这并不意味着它们为认知科学尝试的系统描述提供了最佳构造和组成部分。因此,在我看来,沙漠景观理论家需要做更多的工作来证明放弃更传统的价值、回报和成本诉求的解释优势(或者可能表明这些诉求对处理或实施提出了不切实际的要求——参见 Friston,2011b)。
In this desert landscape vision, there are neither goals nor reward signals as such. Instead, there are only (both learnt and species-specific) expectations, across many spatial and temporal scales, which directly enslave both perception and action. Cost functions, in other words, are replaced by expectations concerning actions and their sensory (especially proprioceptive) consequences. Here, I remain unconvinced. For even if such an austere description is indeed possible (and for some critical concerns, see Gershman and Daw, 2012), that would not immediately justify our claiming that it thereby constitutes the better tool for understanding the rich organization of the cognitive economy. To see this, we need only reflect that it’s all “just” atoms, molecules, and the laws of physics too, but that doesn’t mean those provide the best constructs and components for the systemic descriptions attempted by cognitive science. The desert landscape theorist thus needs to do more, it seems to me, to demonstrate the explanatory advantages of abandoning more traditional appeals to value, reward, and cost (or perhaps to show that those appeals make unrealistic demands on processing or implementation—see Friston, 2011b).
在我看来,沙漠景观故事的正确之处可能在于,它表明效用(或者更普遍地说,个人和享乐价值)不仅仅是一种附加物,它是由 Gershman 和 Daw (2012, p. 296) 所描述的“隔离的大脑中概率和效用的表示。”相反,我们似乎很可能以最终折叠其个人、情感和享乐意义的方式来表示概率定义的事件。这种折叠可能在额叶皮质中尤为明显(Merker,2004 年)。但强大的后向连接网络确保这种折叠一旦发生,就能够(如 Barrett 和 Bar,2009 年所指出的)影响复杂处理层次结构中每个较低阶段的处理和表示。如果这被证明是正确的,那么正是相对于这些情感丰富且充满个人历史的期望计算出的预测误差推动了学习和反应。
What may well be right about the desert landscape story, it seems to me, is the suggestion that utility (or more generally, personal and hedonic value) is not simply a kind of add-on, implemented by what Gershman and Daw (2012, p. 296) describe as a “segregated representation of probability and utility in the brain.” Instead, it seems likely that we represent the very events over which probabilities become defined in ways that ultimately fold in their personal, affective, and hedonic significance. This folding-in is probably especially marked in frontolimbic cortex (Merker, 2004). But the potent web of backward connections ensures that such folding-in, once it has occurred, is able (as noted by Barrett and Bar, 2009) to impact processing and representation at every lower stage of the complex processing hierarchy. If this proves correct, then it is prediction error calculated relative to these affectively rich and personal-history-laden expectations that drives learning and response.
如此构建,以行动为导向的预测处理框架与其说是革命性的,不如说是令人放心的综合性。它的最大价值在于提出了一套深刻的统一原则,用于理解神经功能和组织的多个方面。它通过描述一种能够以系统地处理不确定性、模糊性和噪声的方式结合高级知识和低级(感官)信息的架构来实现这一点。通过这样做,它揭示了感知、行动、学习和注意力是减少我们与世界交流中(可能带有情感和反映目标的)预测误差的不同但互补的手段。它还同时显示人类学习对存在于自然环境和人造环境中的深层统计结构有着敏感的反应。因此,以行动为导向的预测处理留下了许多未明确的部分,包括(1)我们独特的进化轨迹所要求的神经和身体结构的初始多样性(也许是内部表征形式),以及(2)我们在学习和发展过程中大量沉浸在“设计环境”中而获得的“虚拟”神经结构和表征形式的多样性。
Thus construed, an action-oriented predictive processing framework is not so much revolutionary as it is reassuringly integrative. Its greatest value lies in suggesting a set of deep unifying principles for understanding multiple aspects of neural function and organization. It does this by describing an architecture capable of combining high-level knowledge and low-level (sensory) information in ways that systematically deal with uncertainty, ambiguity, and noise. In so doing it reveals perception, action, learning, and attention as different but complementary means to the reduction of (potentially affect-laden and goal-reflecting) prediction error in our exchanges with the world. It also, and simultaneously, displays human learning as sensitively responsive to the deep statistical structures present in both our natural and human-built environments. Thus understood, action-oriented predictive processing leaves much unspecified, including (1) the initial variety of neural and bodily structures (and perhaps internal representational forms) mandated by our unique evolutionary trajectory, and (2) the acquired variety of “virtual” neural structures and representational forms installed by our massive immersion in “designer environments” during learning and development.
要填补这些细节,至少我认为需要深入(但令人满意地自然)地参与进化、具身和情境化的方法。在这种背景下,了解感知、行动、学习和注意力如何从相同的基本材料(预测和预测误差最小化)构建出来,是强大而富有启发性的。正是在那里,弗里斯顿雄心勃勃的综合最具启发性,也正是在那里,我们找到了该解释最实质性的经验承诺。这些承诺是对精确加权预测误差的计算(通过专用误差单元或某些功能等效的方法)和神经系统的广泛使用,以及将其用作感官信息前向流动的代理。这种情况越普遍,故事的经验性就越强。如果它没有发生,或者只在少数特殊情况下发生,那么这个故事就不能成为一个独特的经验解释。27
To fill in these details requires, or so I have argued, a deep (but satisfyingly natural) engagement with evolutionary, embodied, and situated approaches. Within that context, seeing how perception, action, learning, and attention might all be constructed out of the same base materials (prediction and prediction error minimization) is powerful and illuminating. It is there that Friston’s ambitious synthesis is at its most suggestive, and it is there that we locate the most substantial empirical commitments of the account. Those commitments are to the computation (by dedicated error units or some functionally equivalent means) and widespread use by the nervous system of precision-weighted prediction error, and its use as proxy for the forward flow of sensory information. The more widespread this is, the greater the empirical bite of the story. If it doesn’t occur, or occurs only in a few special circumstances, the story fails as a distinctive empirical account.27
面向行动的预测处理模型几乎克服了之前试图建立统一的心理科学的一些主要障碍,计算方法是一种将大脑和行动联系起来的理论。它们从现有的、易于理解的计算方法中选取熟悉的元素(例如使用循环神经网络架构的无监督和自监督学习形式,以及使用概率生成模型进行感知和行动),并将它们一方面与对理性反应的先验约束(贝叶斯维度)联系起来,另一方面与可信且(越来越)可测试的神经实现描述联系起来。这种在理性、计算和神经之间的强大定位是它们最吸引人的特征。在某些方面,它们为 Marr 的梦想提供了答案的萌芽:一种系统的方法,解决(在 Marr,1982 的词汇中)计算、算法和实现的层面。
Action-oriented predictive processing models come tantalizingly close to overcoming some of the major obstacles blocking previous attempts to ground a unified science of mind, brain, and action. They take familiar elements from existing, well-understood, computational approaches (such as unsupervised and self-supervised forms of learning using recurrent neural network architectures, and the use of probabilistic generative models for perception and action) and relate them, on the one hand, to a priori constraints on rational response (the Bayesian dimension), and, on the other hand, to plausible and (increasingly) testable accounts of neural implementation. It is this potent positioning between the rational, the computational, and the neural that is their most attractive feature. In some ways, they provide the germ of an answer to Marr’s dream: a systematic approach that addresses the levels of (in the vocabulary of Marr, 1982) the computation, the algorithm, and the implementation.
应用范围之广令人震惊。本质上,这里的相同模型解释了各种表面上不同的影响,涵盖感知、行动和注意力。事实上,思考这些模型的主要“附加值”的一种方式是,它们将感知、行动和注意力纳入一个统一的框架。因此,我认为,它们构成了最近强调心灵和理性的具体、环境嵌入维度的方法的完美解释伙伴。28如果这些观点是正确的,感知、行动和注意力都属于同一个家族事务:减少因我们与环境的交流而导致的感官预测误差。一旦揭示了这一基本家族事务,长期环境结构(物质和社会文化)就会井然有序地就位。我们构建我们的世界和行动,以便我们的大多数感官预测都能成真。
The sheer breadth of application is striking. Essentially the same models here account for a variety of superficially disparate effects spanning perception, action, and attention. Indeed, one way to think about the primary “added value” of these models is that they bring perception, action, and attention into a single unifying framework. They thus constitute the perfect explanatory partner, I have argued, for recent approaches that stress the embodied, environmentally embedded, dimensions of mind and reason.28 Perception, action, and attention, if these views are correct, are all in the same family business: that of reducing sensory prediction error resulting from our exchanges with the environment. Once this basic family business is revealed, longer-term environmental structuring (both material and socio-cultural) falls neatly into place. We structure our worlds and actions so that most of our sensory predictions come true.
但这种整洁背后隐藏着重要的复杂性。因为,所有这些物质和社会文化支架的另一个影响是,当我们使用预先存在的物质工具和继承的社会结构来应对新问题时,会引起大量的路径依赖。结果,或者至少我是这么认为的,是人类认知的完整描述不可能指望从面向行动的预测处理的基本组织原则直接“跳跃”到人类思想和理性的完整(在某些方面是特殊的)形态的描述。
But this neatness hides important complexity. For, another effect of all that material and socio-cultural scaffolding is to induce substantial path-dependence as we confront new problems using pre-existing material tools and inherited social structures. The upshot, or so I have argued, is that a full account of human cognition cannot hope to “jump” directly from the basic organizing principles of action-oriented predictive processing to an account of the full (and in some ways idiosyncratic) shape of human thought and reason.
取而代之的是一种自然的联盟。以行动为导向的预测处理所强调的基本组织原则使我们对训练环境的结构和统计数据极为敏感。但是,我们人类的训练环境现在是如此彻底地人为化,我们的明确推理形式如此深深地受到各种形式的外部符号支架的影响,以至于理解独特的人类认知需要一种多重混合方法。这种方法将结合来自概率生成方法(其中包括以行动为导向的预测处理)的深度计算洞察力与可靠的神经科学推测,以及对我们许多自我结构环境如何改变和转变人类理性问题空间的充分理解。因此,最紧迫的实际问题涉及所提供的解释与探索或揭示人类思维这些更具特殊性或进化路径依赖性特征的方法之间的“解释权重分配”,以及它在其中发展的社会文化茧的复杂变革效应。
What emerges instead is a kind of natural alliance. The basic organizing principles highlighted by action-oriented predictive processing make us superbly sensitive to the structure and statistics of the training environment. But our human training environments are now so thoroughly artificial, and our explicit forms of reasoning so deeply infected by various forms of external symbolic scaffolding, that understanding distinctively human cognition demands a multiply hybrid approach. Such an approach would combine the deep computational insights coming from probabilistic generative approaches (among which figure action-oriented predictive processing) with solid neuroscientific conjecture and with a full appreciation of the way our many self-structured environments alter and transform the problem spaces of human reason. The most pressing practical questions thus concern what might be thought of as the “distribution of explanatory weight” between the accounts on offer, and approaches that explore or uncover these more idiosyncratic or evolutionary path-dependent features of the human mind, and the complex transformative effects of the socio-cultural cocoon in which it develops.
关于基本预测处理理论本身的适当范围,也存在一些问题。该理论真的能够阐明各种不同的推理、想象和行动选择吗?当我们越来越远离基本感知和运动控制的安全地带时,贝叶斯推理的局部近似会是什么样子?那么需要什么新的表示形式,它们在分层预测编码机制的背景下如何表现?我们对基本贝叶斯解释对我们实际处理有多大信心?(例如,我们是否足够了解系统何时使用真正的贝叶斯方案的“真正近似”来计算其输出,而不是仅仅表现得“好像”它这样做了?)
Questions also remain concerning the proper scope of the basic predictive processing account itself. Can that account really illuminate reason, imagination, and action selection in all its diversity? What do the local approximations to Bayesian reasoning look like as we depart further and further from the safe shores of basic perception and motor control? What new forms of representation are then required, and how do they behave in the context of the hierarchical predictive coding regime? How confident are we of the basic Bayesian gloss on our actual processing? (Do we, for example, have a firm enough grip on when a system is computing its outputs using a “genuine approximation” to a true Bayesian scheme, rather than merely behaving “as if” it did so?)
挑战(经验、概念和方法)很多而且很深刻。但潜在的回报是巨大的。它提供的是对学习和推理背后的一些最深层的自然原理的多层次解释,并且可能能够将感知、行动和注意力置于一个单一的保护伞下。随后神经科学、计算理论、心理学、哲学、理性决策理论和具身认知科学之间的交流有望成为 21 世纪初的主要智力事件之一。
The challenges (empirical, conceptual, and methodological) are many and profound. But the potential payoff is huge. What is on offer is a multilevel account of some of the deepest natural principles underlying learning and inference, and one that may be capable of bringing perception, action, and attention under a single umbrella. The ensuing exchanges between neuroscience, computational theorizing, psychology, philosophy, rational decision theory, and embodied cognitive science promise to be among the major intellectual events of the early twenty-first century.
1.这句话在科学家日志的在线数字档案中被简单地描述为一句“潦草的、未注明日期的格言” :参见http://www.rossashby.info/index.html 。
1. This remark is simply described as a “scribbled, undated, aphorism” in the online digital archive of the scientist’s journal: See http://
2.我非常感谢一位匿名的 BBS 审稿人,他鼓励我更清晰地(历史和概念上)关注这些关键发展。
2. I am greatly indebted to an anonymous BBS referee for encouraging me to bring these key developments into clearer (both historical and conceptual) focus.
3。显而易见的问题是,这个生成模型本身需要学习:如果已经有一个好的识别模型,那么这反过来也是可能的,因为这可以为学习生成模型提供正确的目标。解决方案(Hinton 等人,1995 年)是使用每个模型逐渐引导另一个模型,使用所谓的“唤醒-睡眠算法”——一种计算上可处理的近似“最大似然学习”,如 Dempster 等人(1977 年)的期望最大化 (EM) 算法中所示。尽管如此,当面对需要多层处理的复杂问题时,亥姆霍兹机器仍然缓慢而笨重。但它代表了一种重要的无监督多层学习设备或“深度架构”的早期版本(Hinton,2002、2007b、2010;Hinton 和 Salakhutdinov,2006;Hinton 等,2006;有关评论,请参阅 Bengio,2009;Hinton,2007a)。
3. The obvious problem was that this generative model itself needed to be learnt: something that would in turn be possible if a good recognition model was already in place, since that could provide the right targets for learning the generative model. The solution (Hinton et al., 1995) was to use each to gradually bootstrap the other, using the so-called “wake-sleep algorithm”—a computationally tractable approximation to “maximum likelihood learning” as seen in the expectation-maximization (EM) algorithm of Dempster et al. (1977). Despite this, the Helmholtz Machine remained slow and unwieldy when confronted with complex problems requiring multiple layers of processing. But it represents an important early version of an unsupervised multilayer learning device, or “deep architecture” (Hinton, 2002, 2007b, 2010; Hinton and Salakhutdinov, 2006; Hinton et al., 2006; for reviews, see Bengio, 2009; Hinton, 2007a).
4 . 这表示在给定一组先前信念和证据(此处为当前感官刺激模式)的情况下,事件(此处为世俗原因)发生的概率。因此,就我们的目的而言,它表示在感官后果的条件下,世俗(或身体)原因发生的概率。
4. This names the probability of an event (here, a worldly cause), given some set of prior beliefs and the evidence (here, the current pattern of sensory stimulation). For our purposes, it thus names the probability of a worldly (or bodily) cause, conditioned on the sensory consequences.
5.说到“预测处理”,而不是停留在更常见的“预测编码”这一用法上,我的目的是强调目标方法的区别不仅仅在于使用被称为预测编码的数据压缩策略。相反,它是在部署概率生成模型的分层系统的特殊环境中使用该策略。此类系统表现出强大的学习形式,能够灵活地在多层级联中结合自上而下和自下而上的信息流。
5. In speaking of “predictive processing” rather than resting with the more common usage “predictive coding,” I mean to highlight the fact that what distinguishes the target approaches is not simply the use of the data compression strategy known as predictive coding. Rather, it is the use of that strategy in the special context of hierarchical systems deploying probabilistic generative models. Such systems exhibit powerful forms of learning and are able flexibly to combine top-down and bottom-up flows of information within a multilayer cascade.
6 . 在下文中,在假设的分层模型背景下,先验、经验先验和先验信念的概念可以互换使用。
6. In what follows, the notions of prior, empirical prior, and prior belief are used interchangeably, given the assumed context of a hierarchical model.
7.由于这些建议涉及在多层架构中部署自上而下的概率生成模型,因此新皮质的组织结构最有可能提供必要的实现。这并不是要排除使用其他结构的相关处理模式(例如,在非人类动物中),而只是为了找出“最佳匹配”。这也不是要排除这样一种可能性,即大脑中信息流的大规模路由的细节可能依赖于门控效应,尽管这种效应是由皮质介导的,但涉及其他结构和区域。有关皮质结构本身之间的此类门控效应的一些研究,请参阅 den Ouden 等人(2010 年)。
7. Because these proposals involve the deployment of topdown probabilistic generative models within a multilayer architecture, it is the organizational structure of the neocortex that most plausibly provides the requisite implementation. This is not to rule out related modes of processing using other structures, for example, in nonhuman animals, but simply to isolate the “best fit.” Nor is it to rule out the possibility that, moment-to-moment, details of the large-scale routing of information flow within the brain might depend on gating effects that, although cortically mediated, implicate additional structures and areas. For some work on such gating effects among cortical structures themselves, see den Ouden et al. (2010).
8.我采用了神经解剖学家的做法,将连接简单地标记为“向后”和“向前”,以避免标签“反馈”和“前馈”的功能含义。这在预测处理模型中很重要,因为现在前向连接实际上(通过传达预测误差)为向下流动的预测提供反馈 - 参见 Friston (2005) 和 Hohwy (2007)。感谢一位 BBS 审阅者提出的这一有用的术语建议。
8. I have adopted the neuroanatomist practice of labeling connections simply as “backward” and “forward” so as to avoid the functional implications of the labels “feedback” and “feedforward.” This is important in the context of predictive processing models, since it is now the forward connections that are really providing (by conveying prediction error) feedback on the downward-flowing predictions—see Friston (2005) and Hohwy (2007). Thanks to one of the BBS reviewers for this helpful terminological suggestion.
9 . 这也被称为“主动推理”(参见 Friston 等人,2009 年)。我创造了“面向行动的预测处理”,因为它清楚地表明,这是一个包含行动的关于感知的(分层)预测编码故事的概括。它还(正确地)表明,行动在这些解释中成为概念上的首要问题,因为它提供了唯一方法(一旦建立了良好的世界模型并适当激活)来实际改变感官信号,从而减少感官预测误差——参见 Friston(2009 年,第 295 页)。此外,Friston 最近关于主动推理的研究似乎涉及强烈承诺(Friston,2011a,特别参见),以全面取代被视为行动决定因素的价值函数,用对行动的期望(“先前的信念”,但请注意,这里的“信念”含义非常广泛)。这是一个有趣且具有挑战性的建议,它超越了关于形式等价性的主张,甚至超越了关于将行动与感知联系起来的深层概念关系的观察。我将使用“行动导向的预测处理”这个术语,它在这个重要问题上保持了刻意的不可知论(另见第 17.5.1 节)。
9. This is also known (see, e.g., Friston et al., 2009) as “active inference.” I coin “action-oriented predictive processing” as it makes clear that this is an action-encompassing generalization of the (hierarchical) predictive coding story about perception. It also suggests (rightly) that action becomes conceptually primary in these accounts, since it provides the only way (once a good world model is in place and aptly activated) to actually alter the sensory signal so as to reduce sensory prediction error—see Friston (2009, p. 295). In addition, Friston’s most recent work on active inference looks to involve a strong commitment (Friston, 2011a, see especially) to the wholesale replacement of value functions, considered as determinants of action, with expectations (“prior beliefs,” though note that “belief” here is very broadly construed) about action. This is an interesting and challenging suggestion that goes beyond claims concerning formal equivalence and even beyond the observations concerning deep conceptual relations linking action and perception. “Action-oriented predictive processing,” as I shall use the term, remains deliberately agnostic on this important matter (see also sect. 17.5.1).
10.我顺便指出,这种激进的观点与一些关于高层次(反思性)意图和行动的有影响力的哲学著作产生了共鸣:具体来说,Velleman (1989) 对实践推理的论述,其中行动意图被描述为对自己行为的自我实现的期望(例如,参见 Velleman, 1989, 第 98 页)。
10. I note in passing that this radical view resonates with some influential philosophical work concerning high level (reflective) intentions and actions: specifically, Velleman’s (1989) account of practical reasoning in which intentions to act are depicted as self-fulfilling expectations about one’s own actions (see, e.g., Velleman, 1989, p. 98).
11.正如一位 BBS 裁判所言,这种效率是一把双刃剑。因为,前向处理中明显的效率是以多级生成机制本身为代价的:该机制的实施和运行需要一整套额外的连接才能实现双向层次结构的向下延伸。因此,预测处理的理由并非基于“沟通节俭”,而是基于由此产生的系统的强大功能和范围。
11. This kind of efficiency, as one of the BBS referees nicely noted, is something of a double-edged sword. For, the obvious efficiencies in forward processing are here bought at the cost of the multilevel generative machinery itself: machinery whose implementation and operation requires a whole set of additional connections to realize the downward swoop of the bidirectional hierarchy. The case for predictive processing is thus not convincingly made on the basis of “communicative frugality” so much as upon the sheer power and scope of the systems that result.
12。 在私人信件中,Lee de-Wit 指出,他的用法遵循了 Murray 等人(2004 年)和 Dumoulin 和 Hess(2006 年)等人的用法,他们都将“预测编码”与“有效编码”进行了对比,前者使用自上而下的影响来减去预测的低级活动元素,而后者使用自上而下的影响来增强或锐化它。这当然可能使这两个故事(减法和锐化)看起来似乎提供了相互竞争的解释,例如,fMRI 数据显示,随着较高区域稳定下来解释形状刺激,早期视觉区域的反应会减弱。这些解释将是替代的,因为这种减弱可能反映了早期反应中预测良好的部分的减法(“预测编码”)或早期信号其余部分的压制以及随之而来的一致元素的锐化。然而,我正在考虑的模型同时适用于减法和锐化(有关详细信息,请参阅正文)。因此,这是一个实例(参见第 17.5.1 节),其中目标提案的更激进元素(这里是预测信号元素的减去)经过仔细检查后发现与更熟悉的效果(例如自上而下的增强)一致。
12. In personal correspondence, Lee de-Wit notes that his usage follows that of, for example, Murray et al. (2004) and Dumoulin and Hess (2006), both of whom contrast “predictive coding” with “efficient coding,” where the former uses top-down influence to subtract out predicted elements of lower-level activity, and the latter uses top-down influence to enhance or sharpen it. This can certainly make it look as if the two stories (subtraction and sharpening) offer competing accounts of, for example, fMRI data such as Murray et al. (2002) showing a dampening of response in early visual areas as higher areas settled into an interpretation of a shape stimulus. The accounts would be alternatives, since the dampening might then reflect either the subtraction of well-predicted parts of the early response (“predictive coding”) or the quashing of the rest of the early signal and the attendant sharpening of the consistent elements. The models I am considering, however, accommodate both subtraction and sharpening (see main text for details). This is therefore an instance (see sect. 17.5.1) in which more radical elements of the target proposals (here, the subtracting away of predicted signal elements) turn out, on closer examination, to be consistent with more familiar effects (such as top-down enhancement).
13 . 选择性锐化和“解释”的抑制效应之间的一致性也使得区分预测编码和“证据积累”解释(如 Gold 和 Shadlen (2001) 的解释)的实证含义变得更加困难(尽管并非不可能),有关综述,请参阅 Smith 和 Ratcliff (2004)。有关尝试,请参阅 Hesselmann 等人 (2010)。
13. The consistency between selective sharpening and the dampening effects of “explaining away” also makes it harder—though not impossible—to tease apart the empirical implications of predictive coding and “evidence accumulation” accounts such as Gold and Shadlen’s (2001)—for a review, see Smith and Ratcliff (2004). For an attempt to do so, see Hesselmann et al. (2010).
14 . 在这篇(2008 年)论文中,Spratling 进一步指出,我们一直在考虑的分层预测编码形式在数学上等同于某些形式的“有偏见的竞争”模型,但它们仍然提出了有关神经实现的不同主张。我在这里对这些有趣的主张不持任何立场。
14. In this (2008) treatment Spratling further argues that the forms of hierarchical predictive coding account we have been considering are mathematically equivalent to some forms of “biased competition” model, but that they nonetheless suggest different claims concerning neural implementation. I take no position on these interesting claims here.
15.我在这里改编了 Friston (2002, p. 237) 的一个类似例子,仅仅是为了简洁起见。
15. I here adapt, merely for brevity of exposition, a similar example from Friston (2002, p. 237).
16.从技术上讲,始终只有一个单一的层次化生成模型在发挥作用。在这里谈到多个内部模型,我只是想指出层次结构支持多层次的处理,这些处理通过构建专门处理不同特征和属性的不同“知识结构”来分配认知劳动(以便预测在不同时间和空间尺度上发生的事件和规律)。
16. Technically, there is always a single hierarchical generative model in play. In speaking here of multiple internal models, I mean only to flag that the hierarchical structure supports many levels of processing which distribute the cognitive labor by building distinct “knowledge structures” that specialize in dealing with different features and properties (so as to predict events and regularities obtaining at differing temporal and spatial scales).
17。这里的明确血统是联结主义和循环人工神经网络的研究(例如,参见 Rumelhart 等人,1986b,以及早期的讨论,如 Churchland.,1989;Clark,1989)。在我看来,新提案最令人兴奋的是,它们保留了这一血统的许多见解(其中包括亥姆霍兹机器的研究和正在进行的“深度架构”研究——参见第 17.1.1 节),同时明确地与贝叶斯理论和当代神经科学研究和猜想联系在一起。
17. The clear lineage here is with work in connectionism and recurrent artificial neural networks (see, e.g., Rumelhart et al., 1986b, and early discussions such as Churchland., 1989; Clark, 1989). What is most exciting about the new proposals, it seems to me, is that they retain many of the insights from this lineage (which goes on to embrace work on Helmholz machines and ongoing work on “deep architectures”—see sect. 17.1.1) while making explicit contact with both Bayesian theorizing and contemporary neuroscientific research and conjecture.
18.根据温迪·莱纳特(Wendy Lehnert)2007 年的回忆录,这些术语是由鲍勃·阿贝尔森(Bob Abelson)在1981 年认知科学学会第三届年会的主题演讲中提出的。
18. These terms, according to a memoir by Wendy Lehnert (2007), were introduced by Bob Abelson as part of a keynote address to the 3rd Annual Meeting of the Cognitive Science Society in 1981.
19.在我看来,构成本讨论重点的分层预测编码模型系列(及其对行动的扩展)并不适合归入这两个阵营。它们显然与 Griffiths 等人强调的“纯”结构化概率方法共享贝叶斯基础,但它们的计算根源在于(如我们在第 17.1.1 节中看到的)使用人工神经网络进行机器学习的工作。然而,重要的是,分层预测处理模型现在将认知神经科学的“自下而上”见解与那些强大的学习和推理计算机制进行了越来越富有成效的接触,在一个统一的框架中,该框架能够(正如 Griffiths 等人正确强调的那样)容纳各种各样的表面表示形式。此外,这种方法在计算上易于处理,因为局部(预测误差最小化)例程被用于近似贝叶斯推理。关于如何解决目前出现的深刻且不可调和的冲突的一些极好的办法,请参阅 Feldman (2010) 和 Lee (2010)。
19. The hierarchical predictive coding family of models that (along with their extensions to action) form the main focus of the present treatment are not, in my view, happily assimilated to either of these camps. They clearly share Bayesian foundations with the “pure” structured probabilistic approaches highlighted by Griffiths et al., but their computational roots lie (as we saw in sect 17.1.1) in work on machine learning using artificial neural networks. Importantly, however, hierarchical predictive processing model now bring “bottom-up” insights from cognitive neuroscience into increasingly productive contact with those powerful computational mechanisms of learning and inference, in a unifying framework able (as Griffiths et al. correctly stress) to accommodate a very wide variety of surface representational forms. Moreover, such approaches are computationally tractable because local (prediction-error minimizing) routines are being used to approximate Bayesian inference. For some excellent antidotes to the appearance of deep and irreconcilable conflict hereabouts, see Feldman (2010) and Lee (2010).
20。 现在需要更好地理解这种多重相互作用机制(各种慢速神经调节器可能与神经同步协同作用),同时彻底检查预测流和预测误差权重(精度)调节效应可能表现出的各种方式和水平(有关一些早期尝试,请参阅 Corlett 等人,2010 年;另请参阅 Friston 和 Kiebel,2009 年)。如果我们要更好地理解“注意力”(此处理解为——参见第 17.2.2 节——修改预测误差增益的各种方式)可能以多种方式运作,从而通过灵活控制自上而下和自下而上影响之间的平衡来偏向处理,那么更多地了解预测误差的流动和影响可能被操纵的方式和水平至关重要。
20. A much better understanding of such multiple interacting mechanisms (various slow neuromodulators perhaps acting in complex concert with neural synchronization) is now needed, along with a thorough examination of the various ways and levels at which the flow of prediction and the modulating effects of the weighting of prediction error (precision) may be manifest (for some early forays, see Corlett et al., 2010; see also Friston and Kiebel, 2009). Understanding more about the ways and levels at which the flow and impact of prediction error may be manipulated is vitally important if we are to achieve a better understanding of the multiple ways in which “attention” (here understood—see sect. 17.2.2—as various ways of modifying the gain on prediction error) may operate so as to bias processing by flexibly controlling the balance between top-down and bottom-up influence.
21 . 可能到处都有这种现象的温和版本,无论是在科学领域(Maher,1988)还是在日常生活中。我们倾向于看到我们所期望的东西,并用它来确认既产生我们的期望又塑造和过滤我们的观察结果的模型。
21. There are probably milder versions of this everywhere, both in science (Maher, 1988) and in everyday life. We tend to see what we expect, and we use that to confirm the model that is both generating our expectations and sculpting and filtering our observations.
22.有趣的是,作者还能够将该模型应用于一项非药物干预措施:感觉剥夺。
22. Intriguingly, the authors are also able to apply the model to one non-pharmacological intervention: sensory deprivation.
23 . 这并不一定意味着有意识地参与这种自我生成过程的能力。这种丰富、有意识的想象形式可能需要额外的资源,例如 Dennett (1991) 第 8 章中描述的语言驱动的认知“自我刺激”形式。
23. This need not imply an ability deliberately to engage in such a process of self-generation. Such rich, deliberate forms of imagining may well require additional resources, such as the language-driven forms of cognitive “self-stimulation” described in Dennett (1991), Chapter 8.
24。也许值得一提的是,尽管存在深层二元性,但目前的观点并不要求系统在进行基于图像的处理时,通常要支持与日常感官活动所提供的完全相同的稳定性和丰富的经验细节。在没有驱动感官信号的情况下,没有关于低级感知细节的稳定持续信息来限制处理。因此,没有明显的压力来维持甚至产生(参见 Reddy 等人,2010)较低级别的稳定假设:只有主动高级编码施加的任务决定的向下压力。
24. It is perhaps worth remarking that, deep duality notwithstanding, nothing in the present view requires that the system, when engaged in imagery-based processing, will typically support the very same kinds of stability and richness of experienced detail that daily sensory engagements offer. In the absence of the driving sensory signal, no stable ongoing information about low-level perceptual details is there to constrain the processing. As a result, there is no obvious pressure to maintain or perhaps even to generate (see Reddy et al., 2010) a stable hypothesis at the lower levels: there is simply whatever task-determined downward pressure the active higher-level encoding exerts.
25.共同特征包括对前向模型的吸引力和提供机制(如卡尔曼滤波——参见 Friston, 2002;Grush, 2004;Rao 和 Ballard, 1999)用于估计不确定性并(从而)灵活地平衡先前预期和驱动感官输入的影响。事实上,Grush(2004 年,第 393 页)引用了 Rao 和 Ballard(1999)的开创性预测编码工作,作为与更广泛的模拟器框架兼容的视觉处理的解释。此外,Grush 将感知描述为“环境模拟”(参见 Grush, 2004 的第 5.2 节)看起来与将感知描述为重建构成感官信号的隐藏原因(Friston, 2003 及其他地方)高度一致。这些解释似乎存在差异,因为它们强调预测误差(本质上)是感觉信号本身的替代品,强调强贝叶斯解释(使用“经验贝叶斯”的资源应用于处理阶段的层次结构),并试图用自上而下的本体感受预测单独取代运动指令(有关这一颇具挑战性的推测的良好处理,请参阅 Friston,2011a)。尝试进行更详细的比较会很有趣(尽管超出了本处理的范围)。
25. Common features include the appeal to forward models and the provision of mechanisms (such as Kalman filtering—see Friston, 2002; Grush, 2004; Rao and Ballard, 1999) for estimating uncertainty and (thus) flexibly balancing the influence of prior expectations and driving sensory inputs. Indeed, Grush (2004, p. 393) cites the seminal predictive coding work by Rao and Ballard (1999) as an account of visual processing compatible with the broader emulator framework. In addition, Grush’s account of perception as “environmental emulation” (see section 5.2 of Grush, 2004) looks highly congruent with the depiction (Friston, 2003 and elsewhere) of perception as reconstructing the hidden causes structuring the sensory signal. Where the accounts seem to differ is in the emphasis placed on prediction error as (essentially) a replacement for the sensory signal itself, the prominence of a strong Bayesian interpretation (using the resources of “empirical Bayes” applied across a hierarchy of processing stages), and the attempted replacement of motor commands by top-down proprioceptive predictions alone (for a nice treatment of this rather challenging speculation, see Friston, 2011a). It would be interesting (although beyond the scope of the present treatment) to attempt a more detailed comparison.
26。 有关专业化和整合相结合的一般情况,请参阅 Friston (2002) 和 Hohwy (2007) 中的讨论。有关更近期的论述,包括一些有关预测误差在调节区域间耦合方面可能发挥的作用的实验证据,请参阅 den Ouden 等人 (2010)。
26. For the general story about combining specialization and integration, see Friston (2002) and discussion in Hohwy (2007). For a more recent account, including some experimental evidence concerning the possible role of prediction error in modulating inter-area coupling, see den Ouden et al. (2010).
27.因此,正如 Egner 及其同事最近提出的那样,经验主义的观点是“预测(基于内部前向模型)和预测误差的编码可能是大脑认知的普遍特征……而不是奖励学习的好奇心……或运动计划”(Egner 等人,2010 年,第 16607 页)。
27. The empirical bet is thus, as Egner and colleagues recently put it, that “the encoding of predictions (based on internal forward models) and prediction errors may be a ubiquitous feature of cognition in the brain … rather than a curiosity of reward learning … or motor planning” (Egner et al., 2010, p. 16607).
28 . 当它们被置于“自由能量原理”这个更为包罗万象的框架下时,其综合抱负是巨大的。如果这些解释确实像 Friston (2010) 所建议的那样融合在一起,那将揭示生命与心灵之间最深层的联系,证实并扩展被称为“行动主义”认知科学的观点(例如,参见 Di Paolo,2009 年;Thompson,2007 年;Varela 等人,1991 年)。
28. When brought under the even-more-encompassing umbrella of the “free energy principle”, the combined ambition is formidable. If these accounts were indeed to mesh in the way Friston (2010) suggests, that would reveal the very deepest of links between life and mind, confirming and extending the perspective known as “enactivist” cognitive science (see, e.g., Di Paolo, 2009; Thompson, 2007; Varela et al., 1991).
犹地亚珍珠
Judea Pearl
2018
2018
如果我们研究当今推动机器学习的信息,就会发现它几乎完全是统计性的。换句话说,学习机器通过优化从环境中接收到的一系列感官输入的参数来提高其性能。这是一个缓慢的过程,在许多方面类似于推动达尔文进化的自然选择过程。它解释了鹰和蛇等物种如何在数百万年内发展出卓越的视觉系统。但它无法解释让人类能够在短短一千年内制造出眼镜和望远镜的超级进化过程。人类拥有而其他物种所缺乏的是一种心理表征,即他们环境的蓝图,他们可以随意操纵它来想象用于规划和学习的替代假设环境。人类学家 N. Harari 和 S. Mithen 普遍认为,大约 4 万年前,智人祖先之所以能够统治全球,决定性因素在于他们能够编排对周围环境的心理表征,审问该表征,通过想象的心理活动对其进行扭曲,最终回答“如果……会怎样?”之类的问题。例如干预性问题:“如果我采取行动会怎样?”和回顾性或解释性问题:“如果我采取不同的行动会怎样?”当今运行的任何学习机器都无法回答此类前所未有的干预问题,例如“如果我们禁止吸烟会怎样。”此外,当今大多数学习机器都没有提供可以从中得出此类问题答案的表征。
If we examine the information that drives machine learning today, we find that it is almost entirely statistical. In other words, learning machines improve their performance by optimizing parameters over a stream of sensory inputs received from the environment. It is a slow process, analogous in many respects to the natural selection process that drives Darwinian evolution. It explains how species like eagles and snakes have developed superb vision systems over millions of years. It cannot explain however the super-evolutionary process that enabled humans to build eyeglasses and telescopes over barely one thousand years. What humans possessed that other species lacked was a mental representation, a blue-print of their environment which they could manipulate at will to imagine alternative hypothetical environments for planning and learning. Anthropologists like N. Harari, and S. Mithen are in general agreement that the decisive ingredient that gave our Homo sapiens ancestors the ability to achieve global dominion, about 40,000 years ago, was their ability to choreograph a mental representation of their environment, interrogate that representation, distort it by mental acts of imagination and finally answer “What if?” kind of questions. Examples are interventional questions: “What if I act?” and retrospective or explanatory questions: “What if I had acted differently?” No learning machine in operation today can answer such questions about interventions not encountered before, say, “What if we ban cigarettes.” Moreover, most learning machines today do not provide a representation from which the answers to such questions can be derived.
我认为,通过消除这些障碍并为学习机器配备因果推理工具,可以克服实现加速学习速度以及人类水平表现的主要障碍。在 20 年前,在反事实数学化之前,这个假设可能只是推测。但今天并非如此。
I postulate that the major impediment to achieving accelerated learning speeds as well as human level performance should be overcome by removing these barriers and equipping learning machines with causal reasoning tools. This postulate would have been speculative twenty years ago, prior to the mathematization of counterfactuals. Not so today.
图形和结构模型的进步使得反事实在计算上易于管理,从而使模型驱动推理成为强人工智能更有希望的方向。在下一节中,我将使用控制因果推理推理的三级层次结构来描述机器学习系统面临的障碍。最后一节总结了如何使用现代因果推理工具来规避这些障碍。
Advances in graphical and structural models have made counterfactuals computationally manageable and thus rendered model-driven reasoning a more promising direction on which to base strong AI. In the next section, I will describe the impediments facing machine learning systems using a three-level hierarchy that governs inferences in causal reasoning. The final section summarizes how these impediments were circumvented using modern tools of causal inference.
因果推理逻辑揭示了一个极其有用的见解,即因果信息存在严格的分类,即每个类别能够回答的问题类型。分类形成了一个 3 级层次结构,即只有当j级(j ≥ i )的信息可用时,才能回答i级(i = 1、2、3)的问题。
An extremely useful insight unveiled by the logic of causal reasoning is the existence of a sharp classification of causal information, in terms of the kind of questions that each class is capable of answering. The classification forms a 3-level hierarchy in the sense that questions at level i (i = 1, 2, 3) can only be answered if information from level j (j ≥ i) is available.
图 18.1显示了 3 个层次的层次结构,以及每个层次可以回答的典型问题。这些层次的标题为 1. 关联,2. 干预,以及3. 反事实。选择这些层的名称是为了强调它们的用途。我们将第一层称为关联,因为它调用了由裸数据定义的纯统计关系。1例如,观察一个购买牙膏的顾客会使其更有可能购买牙线;这种关联可以直接使用条件期望从观察到的数据中推断出来。由于这一层的问题不需要因果信息,因此被放在层次结构的底层。第二层,干预,排名高于关联,因为它不仅涉及看到现状,还涉及改变我们看到的东西。这一层的一个典型问题是:如果我们将价格翻倍会发生什么?这些问题不能仅从销售数据中回答,因为它们涉及客户行为的变化,以应对新的定价。这些选择可能与以前涨价时的选择大不相同。 (除非我们精确复制价格达到当前价值两倍时的市场条件。)最后,最高层次称为反事实,这个术语可以追溯到哲学家大卫休谟和约翰斯图尔特密尔,在过去二十年中,它被赋予了计算机友好的语义。反事实类别中的一个典型问题是“如果我采取不同的行动会怎样”,因此需要回顾性推理。
Figure 18.1 shows the 3-level hierarchy, together with the characteristic questions that can be answered at each level. The levels are titled 1. Association, 2. Intervention, and 3. Counterfactual. The names of these layers were chosen to emphasize their usage. We call the first level Association, because it invokes purely statistical relationships, defined by the naked data.1 For instance, observing a customer who buys toothpaste makes it more likely that he/she buys floss; such association can be inferred directly from the observed data using conditional expectation. Questions at this layer, because they require no causal information, are placed at the bottom level on the hierarchy. The second level, Intervention, ranks higher than Association because it involves not just seeing what is, but changing what we see. A typical question at this level would be: What happens if we double the price? Such questions cannot be answered from sales data alone, because they involve a change in customers behavior, in reaction to the new pricing. These choices may differ substantially from those taken in previous price-raising situations. (Unless we replicate precisely the market conditions that existed when the price reached double its current value.) Finally, the top level is called Counterfactuals, a term that goes back to the philosophers David Hume and John Stewart Mill, and which has been given computer-friendly semantics in the past two decades. A typical question in the counterfactual category is “What if I had acted differently,” thus necessitating retrospective reasoning.
图 18.1
因果层次结构。只有当级别i或更高级别的信息可用时,才能回答级别i的问题。
Figure 18.1
The Causal Hierarchy. Questions at level i can only be answered if information from level i or higher is available.
反事实问题被置于层次结构的顶端,因为它们包含了干预和联想问题。如果我们有一个可以回答反事实查询的模型,我们也可以回答有关干预和观察的问题。例如,干预问题“如果我们将价格翻倍会发生什么?”可以通过提出反事实问题来回答:如果价格是当前价值的两倍会发生什么?同样,一旦我们可以回答干预问题,联想问题就可以得到回答;我们只需忽略行动部分,让观察接管。翻译在相反方向上不起作用。干预问题不能从纯粹的观察信息(即仅从统计数据)中得到回答。任何涉及回顾的反事实问题都不能从纯粹的干预信息(例如从受控实验中获得的信息)中得到回答;我们无法对接受药物治疗的受试者重新进行实验,看看他们如果没有服用药物会有什么表现。因此,层次结构是定向的,顶层是最强大的。
Counterfactuals are placed at the top of the hierarchy because they subsume interventional and associational questions. If we have a model that can answer counterfactual queries, we can also answer questions about interventions and observations. For example, the interventional question, What will happen if we double the price? can be answered by asking the counterfactual question: What would happen had the price been twice its current value? Likewise, associational questions can be answered once we can answer interventional questions; we simply ignore the action part and let observations take over. The translation does not work in the opposite direction. Interventional questions cannot be answered from purely observational information (i.e., from statistical data alone). No counterfactual question involving retrospection can be answered from purely interventional information, such as that acquired from controlled experiments; we cannot re-run an experiment on subjects who were treated with a drug and see how they behave had they not given the drug. The hierarchy is therefore directional, with the top level being the most powerful one.
反事实是科学思维以及法律和道德推理的基石。例如,在民事法庭上,如果没有被告的行为,伤害很可能不会发生,被告将被视为伤害的肇事者。但对于的计算意义要求将现实世界与被告行为未发生的替代世界进行比较。
Counterfactuals are the building blocks of scientific thinking as well as legal and moral reasoning. In civil court, for example, the defendant is considered to be the culprit of an injury if, but for the defendant’s action, it is more likely than not that the injury would not have occurred. The computational meaning of but for calls for comparing the real world to an alternative world in which the defendant action did not take place.
层次结构中的每一层都有一个句法特征,用于描述进入该层的句子。例如,关联层的特征是条件概率句子,例如P ( y | x ) = p表示:假设我们观察到事件X = x ,则事件Y = y的概率等于p。在大型系统中,可以使用贝叶斯网络或任何支持深度学习系统的神经网络有效地计算此类证据句子。
Each layer in the hierarchy has a syntactic signature that characterizes the the sentences admitted into that layer. For example, the association layer is characterized by conditional probability sentences, e.g., P(y|x) = p stating that: the probability of event Y = y given that we observed event X = x is equal to p. In large systems, such evidential sentences can be computed efficiently using Bayesian Networks, or any of the neural networks that support deep-learning systems.
在干预层,我们发现P ( y | do ( x ) , z )类型的句子,它表示“假设我们进行干预并将X的值设置为x并随后观察到事件Z = z ,则事件Y = y的概率” 。此类表达式可以通过随机试验进行实验估计,也可以使用因果贝叶斯网络进行分析(Pearl,2000,第 3 章)。儿童通过对环境的嬉戏操纵(通常是在确定性的游乐场中)来学习干预的效果,而人工智能规划者则通过执行指定的一组动作来获得干预知识。无论数据有多大,都无法仅从被动观察中推断出干预表达式。
At the interventional layer we find sentences of the type P(y|do(x), z), which denotes “The probability of event Y = y given that we intervene and set the value of X to x and subsequently observe event Z = z. Such expressions can be estimated experimentally from randomized trials or analytically using Causal Bayesian Networks (Pearl, 2000, Chapter 3). A child learns the effects of interventions through playful manipulation of the environment (usually in a deterministic playground), and AI planners obtain interventional knowledge by exercising their designated sets of actions. Interventional expressions cannot be inferred from passive observations alone, regardless of how big the data.
最后,在反事实层面,我们有P ( y x | x ′ , y ′) 类型的表达式,代表“假设我们实际观察到X为x ′ 且Y为y ′ ,则假设X为x,则事件Y = y发生的概率。例如,假设 Joe 的实际工资为y ′ 并且他只上了两年大学,则假设他完成了大学学业,则他的工资为y的概率。”只有当我们拥有函数或结构方程模型,或此类模型的属性时,才能计算出此类句子(Pearl,2000 年,第 7 章)。
Finally, at the counterfactual level, we have expressions of the type P(yx|x′, y′) which stand for “The probability that event Y = y would be observed had X been x, given that we actually observed X to be x′ and and Y to be y′. For example, the probability that Joe’s salary would be y had he finished college, given that his actual salary is y′ and that he had only two years of college.” Such sentences can be computed only when we possess functional or Structural Equation models, or properties of such models (Pearl, 2000, Chapter 7).
这种层次结构及其所包含的正式限制解释了为什么基于统计数据的机器学习系统无法推理行动、实验和解释。它还告诉我们需要哪些额外的统计信息以及以何种格式来支持这些推理模式。
This hierarchy, and the formal restrictions it entails, explains why statistics-based machine learning systems are prevented from reasoning about actions, experiments and explanations. It also informs us what extra-statistical information is needed, and in what format, in order to support those modes of reasoning.
研究人员常常惊讶地发现,这种层次结构将深度学习的令人印象深刻的成就降级到“关联”的水平,与教科书上的曲线拟合练习并列。一种反对这种比较的流行观点认为,曲线拟合的目标是最大化“拟合”,而在深度学习中,我们试图最小化“过度拟合”。不幸的是,将层次结构中的三个层次分开的理论障碍告诉我们,我们的目标函数的性质并不重要。只要我们的系统优化了观察到的数据的某些属性,无论它多么高尚或复杂,同时不参考数据外部的世界,我们就会回到层次结构的第一级,并承受这一级别所带来的所有限制。
Researchers are often surprised that the hierarchy denegrades the impressive achievements of deep learning to the level of Association, side by side with textbook curve-fitting exercises. A popular stance against this comparison argues that, whereas the objective of curve-fitting is to maximize “fit,” in deep learning we try to minimize “over fit.” Unfortunately, the theoretical barriers that separate the three layers in the hierarchy tell us that the nature of our objective function does not matter. As long as our system optimizes some property of the observed data, however noble or sophisticated, while making no reference to the world outside the data, we are back to level-1 of the hierarchy with all the limitations that this level entails.
请考虑以下五个问题:
Consider the following five questions:
这些问题的共同点在于它们都与因果关系有关。我们可以通过诸如“防止”、“导致”、“归因于”、“歧视”和“我应该”等词语来识别它们。这些词在日常语言中很常见,我们的社会不断要求对这些问题给出答案。然而,直到最近,科学还没有给我们任何表达这些问题的手段,更不用说回答它们了。与几何、力学、光学或概率规则不同,因果规则一直被剥夺了数学分析的好处。
The common feature of these questions is that they are concerned with cause-and-effect relationships. We can recognize them through words such as “preventing,” “cause,” “attributed to,” “discrimination,” and “should I.” Such words are common in everyday language, and our society constantly demands answers to such questions. Yet, until very recently science gave us no means even to articulate them, let alone answer them. Unlike the rules of geometry, mechanics, optics or probabilities, the rules of cause and effect have been denied the benefits of mathematical analysis.
要理解这种否认的程度,读者会惊讶地发现,仅仅几十年前,科学家们还无法为“泥浆不会导致下雨”这一显而易见的事实写出一个数学方程。即使在今天,也只有科学界的顶尖人物才能写出这样的方程,并正式区分“泥浆导致下雨”和“下雨导致泥浆”。你可能会更惊讶地发现,你最喜欢的大学教授不在其中。
To appreciate the extent of this denial, readers would be stunned to know that only a few decades ago scientists were unable to write down a mathematical equation for the obvious fact that “mud does not cause rain.” Even today, only the top echelon of the scientific community can write such an equation and formally distinguish “mud causes rain” from “rain causes mud.” And you would probably be even more surprised to discover that your favorite college professor is not among them.
在过去的三十年里,情况发生了巨大的变化。一种用于管理因果关系的数学语言已经开发出来,同时还开发了一套工具,将因果分析变成了一个数学游戏,就像解决代数方程或在高中几何中寻找证明一样。这些工具使我们能够正式表达因果问题,以图表和代数形式编纂我们现有的知识,然后利用我们的数据来估计答案。此外,当现有知识或可用数据不足以回答我们的问题时,该理论会警告我们;然后建议额外的知识或数据来源来回答问题。
Things have changed dramatically in the past three decades. A mathematical language has been developed for managing causes and effects, accompanied by a set of tools that turn causal analysis into a mathematical game, not unlike solving algebraic equations, or finding proofs in high-school geometry. These tools permit us to express causal questions formally, codify our existing knowledge in both diagrammatic and algebraic forms, and then leverage our data to estimate the answers. Moreover, the theory warns us when the state of existing knowledge or the available data are insufficient to answer our questions; and then suggests additional sources of knowledge or data to make the questions answerable.
哈佛大学教授加里·金(Garry King)从历史的角度看待了这一转变:“过去几十年来,人们对因果推理的了解比以往所有有记载的历史中对因果推理的了解总和还要多”(Morgan and Winship,2015 年)。我将这一转变称为“因果革命”(Pearl and Mackenzie,2018 年),并将导致这一转变的数学框架称为“结构因果模型 (SCM)”。
Harvard professor Garry King gave this transformation a historical perspective: “More has been learned about causal inference in the last few decades than the sum total of everything that had been learned about it in all prior recorded history” (Morgan and Winship, 2015). I call this transformation “The Causal Revolution,” (Pearl and Mackenzie, 2018) and the mathematical framework that led to it I call “Structural Causal Models (SCM).”
SCM 部署三个部分
The SCM deploys three parts
1.图形模型,
1. Graphical models,
2. 结构方程,以及
2. Structural equations, and
3. 反事实和干预逻辑
3. Counterfactual and interventional logic
图形模型是一种语言,用来表达我们对世界的了解,反事实帮助我们表达我们想知道的东西,而结构方程则将两者以坚实的语义结合在一起。
Graphical models serve as a language for representing what we know about the world, counterfactuals help us to articulate what we want to know, while structural equations serve to tie the two together in a solid semantics.
图 18.2以推理引擎的形式说明了 SCM 的运行。该引擎接受三个输入:假设、查询和数据,并产生三个输出:估计量、估计值和拟合指数。估计量 ( E S ) 是一个数学公式,它基于假设,提供一种从任何假设数据(只要可用)回答查询的方法。收到数据后,引擎使用估计量对答案产生实际估计值 ( Ê S ),以及对该答案的置信度的统计估计值(以反映数据集的有限大小,以及可能的测量误差或缺失数据。)最后,引擎生成一个“拟合指数”列表,用于衡量数据与模型传达的假设的兼容程度。
Figure 18.2 illustrates the operation of SCM in the form of an inference engine. The engine accepts three inputs: Assumptions, Queries, and Data, and produces three outputs: Estimand, Estimate and Fit indices. The Estimand (ES) is a mathematical formula that, based on the Assumptions, provides a recipe for answering the Query from any hypothetical data, whenever they are available. After receiving the Data, the engine uses the Estimand to produce an actual Estimate (ÊS) for the answer, along with statistical estimates of the confidence in that answer (To reflect the limited size of the data set, as well as possible measurement errors or missing data.) Finally, the engine produces a list of “fit indices” which measure how compatible the data are with the Assumptions conveyed by the model.
图 18.2
SCM“推理引擎”如何将数据与因果模型(或假设)相结合以产生对感兴趣的查询的答案。
Figure 18.2
How the SCM “inference engine” combines data with causal model (or assumptions) to produce answers to queries of interest.
为了举例说明这些操作,我们假设查询代表X对Y的因果关系,写为Q = P ( Y | do ( X )),其中X和Y是两个感兴趣的变量。让我们将建模假设编码到下图中,其中Z是影响两者的第三个变量
To exemplify these operations, let us assume that our Query stands for the causal effect of X on Y, written Q = P(Y|do(X)), where X and Y are two variables of interest. Let the modeling assumptions be encoded in the graph below, where Z is a third variable affecting both
X和Y 。最后,让数据从联合分布P ( X, Y, Z )中随机抽样。引擎计算的估计量 ( E S ) 将是公式。它定义了P ( X, Y, Z )的属性,如果估算出来,将为我们的查询提供正确的答案。答案本身,即估计值Ê S,可以通过多种技术生成,这些技术可以从P ( X, Y, Z ) 的有限样本中产生E S的一致估计值。例如,满足指定X和Z条件的所有案例的样本平均值( Y )将是一个一致的估计值。但可以设计出更有效的估计技术来克服数据稀疏性(Rosenbaum and Rubin,1983)。这是深度学习擅长的地方,也是机器学习中大多数工作关注的重点,尽管没有基于模型的估计量的指导。最后,我们示例中的拟合指数将为 NULL。换句话说,在检查了图表的结构后,引擎应该得出结论,编码的假设没有任何可测试的含义。因此,结果估计的准确性必须完全依赖于图表中编码的假设——无法从数据中获得反驳或证实。2
X and Y. Finally, let the data be sampled at random from a joint distribution P(X, Y, Z). The Estimand (ES) calculated by the engine will be the formula . It defines a property of P(X, Y, Z) that, if estimated, would provide a correct answer to our Query. The answer itself, the Estimate ÊS, can be produced by any number of techniques that produce a consistent estimate of ES from finite samples of P(X, Y, Z). For example, the sample average (of Y) over all cases satisfying the specified X and Z conditions, would be a consistent estimate. But more efficient estimation techniques can be devised to overcome data sparsity (Rosenbaum and Rubin, 1983). This is where deep learning excels and where most work in machine learning has been focused, albeit with no guidance of a model-based estimand. Finally, the Fit Index in our example will be NULL. In other words, after examining the structure of the graph, the engine should conclude that the assumptions encoded do not have any testable implications. Therefore, the veracity of resultant estimate must lean entirely on the assumptions encoded in the graph – no refutation nor corroboration can be obtained from the data.2
同样的程序也适用于更复杂的查询,例如之前讨论过的反事实查询Q = P ( y x | x ′ , y ′ )。我们还可能允许部分数据来自受控实验,如果W是受控变量,则其形式为P ( V | do ( W ) ) 。估计量的作用仍然是将查询转换为可用数据的句法格式,然后指导估计技术的选择,以确保无偏估计。毋庸置疑,转换任务并不总是可行的,在这种情况下,查询将被声明为“不可识别”,并且引擎应以 FAILURE 退出。幸运的是,已经开发出高效、完整的算法来决定可识别性并为各种反事实查询和各种数据类型生成估计量(Bareinboim 和 Pearl,2016)。接下来,我们将鸟瞰 SCM 框架的七个成就,并讨论每个支柱为自动推理艺术带来的独特贡献。
The same procedure applies to more sophisticated queries, for example, the counterfactual query Q = P(yx|x′, y′) discussed before. We may also permit some of the data to arrive from controlled experiments, which would take the form P(V|do(W)), in case W is the controlled variable. The role of the Estimand would remain that of converting the Query into the syntactic format of the available data and, then, guiding the choice of the estimation technique to ensure unbiased estimates. Needless to state, the conversion task is not always feasible, in which case the Query will be declared “non-identifiable” and the engine should exit with FAILURE. Fortunately, efficient and complete algorithms have been developed to decide identifiability and to produce estimands for a variety of counterfactual queries and a variety of data types (Bareinboim and Pearl, 2016). Next we provide a bird’s eye view of seven accomplishments of the SCM framework and discuss the unique contribution that each pillar brings to the art of automated reasoning.
一旦我们认真对待透明度和可测试性的要求,将假设编码成紧凑且可用的形式就不是一件小事。3透明度使分析师能够辨别编码的假设是否合理(基于科学依据),或者是否需要额外的假设。可测试性使我们(无论是分析师还是机器)能够确定编码的假设是否与可用数据兼容,如果不兼容,则确定需要修复的假设。
The task of encoding assumptions in a compact and usable form, is not a trivial matter once we take seriously the requirement of transparency and testability.3 Transparency enables analysts to discern whether the assumptions encoded are plausible (on scientific grounds), or whether additional assumptions are warranted. Testability permits us (be it an analyst or a machine) to determine whether the assumptions encoded are compatible with the available data and, if not, identify those that need repair.
图形模型的进步使得紧凑编码成为可能。它们的透明度自然源于这样一个事实:所有假设都以图形方式编码,反映了研究人员对该领域因果关系的看法;不需要对反事实或统计依赖关系进行判断,因为这些可以从图的结构中读出。可测试性通过称为d分离的图形标准来促进,它提供了原因和概率之间的基本联系。它告诉我们,对于模型中任何给定的路径模式,我们应该在数据中找到什么样的依赖关系模式(Pearl,1988 年)。
Advances in graphical models have made compact encoding feasible. Their transparency stems naturally from the fact that all assumptions are encoded graphically, mirroring the way researchers perceive of cause-effect relationship in the domain; judgments of counterfactual or statistical dependencies are not required, since these can be read off the structure of the graph. Testability is facilitated through a graphical criterion called d-separation, which provides the fundamental connection between causes and probabilities. It tells us, for any given pattern of paths in the model, what pattern of dependencies we should expect to find in the data (Pearl, 1988).
混杂因素,即两个或多个变量中存在未观察到的原因,长期以来被认为是从数据中得出因果推断的主要障碍。这一障碍已被揭开神秘面纱,并通过一种称为“后门”的图形标准“消除混杂”。具体而言,选择一组适当的协变量来控制混杂因素的任务已简化为一个简单的“障碍”难题,可通过简单的算法进行管理(Pearl,1993 年)。
Confounding, or the presence of unobserved causes of two or more variables, has long been consider the the major obstacle to drawing causal inference from data. This obstacle had been demystified and “deconfounded” through a graphical criterion called “back-door.” In particular, the task of selecting an appropriate set of covariates to control for confounding has been reduced to a simple “roadblocks” puzzle manageable by a simple algorithm (Pearl, 1993).
对于“后门”标准不成立的模型,可以使用一个称为do -演算的符号引擎,它可以在可行的情况下预测政策干预的效果,并且当无法通过指定的假设确定预测时,它会失败退出(Pearl,1995 年;Tian 和 Pearl,2002 年;Shpitser 和 Pearl,2008 年)。
For models where the “back-door” criterion does not hold, a symbolic engine is available, called do-calculus, which predicts the effect of policy interventions whenever feasible, and exits with failure whenever predictions cannot be ascertained with the specified assumptions (Pearl, 1995; Tian and Pearl, 2002; Shpitser and Pearl, 2008).
反事实分析处理的是特定个人的行为,这些行为由一组独特的特征确定。例如,假设 Joe 的工资为Y = y,他上大学的时间是X = x年,那么如果 Joe 多受一年教育,他的工资会是多少。
Counterfactual analysis deals with behavior of specific individuals, identified by a distinct set of characteristics. For example, given that Joe’s salary is Y = y, and that he went X = x years to college, what would Joe’s salary be had he had one more year of education.
因果革命的一项最高成就是将反事实推理形式化为图形表示,而图形表示正是研究人员用来编码科学知识的表示。每个结构方程模型都确定了每个反事实句子的真值。因此,我们可以通过分析确定句子的概率是否可以通过实验或观察研究或两者结合来估计(Balke 和 Pearl,2011 年;Pearl,2000 年,第 7 章)。因果话语中特别有趣的是有关“结果的原因”而不是“原因的结果”的反事实问题。例如,Joe 的游泳运动是 Joe 死亡的必要(或充分)原因的可能性有多大(Pearl,2015a;Halpern 和 Pearl,2005 年)。
One of the crown achievements of the Causal Revolution has been to formalize counterfactual reasoning within the graphical representation, the very representation researchers use to encode scientific knowledge. Every structural equation model determines the truth value of every counterfactual sentence. Therefore, we can determine analytically if the probability of the sentence is estimable from experimental or observational studies, or combination thereof (Balke and Pearl, 2011; Pearl, 2000, Chapter 7). Of special interest in causal discourse are counterfactual questions concerning “causes of effects,” as opposed to “effects of causes.” For example, how likely it is that Joe’s swimming exercise was a necessary (or sufficient) cause of Joe’s death (Pearl, 2015a; Halpern and Pearl, 2005).
中介分析涉及将变化从原因传递到结果的机制。识别这种中间机制对于生成解释至关重要,必须调用反事实分析来促进这种识别。反事实的图形表示使我们能够定义直接和间接影响,并决定何时可以从数据或实验中估计这些影响(Robins 和 Greenland,1992 年;Pearl,2001 年;VanderWeele,2015 年)。这种分析可以回答的典型查询是:X对Y的影响中有多少部分是由变量Z介导的。
Mediation analysis concerns the mechanisms that transmit changes from a cause to its effects. The identification of such intermediate mechanism is essential for generating explanations and counterfactual analysis must be invoked to facilitate this identification. The graphical representation of counterfactuals enables us to define direct and indirect effects and to decide when these effects are estimable from data, or experiments (Robins and Greenland, 1992; Pearl, 2001; VanderWeele, 2015). Typical queries answerable by this analysis are: What fraction of the effect of X on Y is mediated by variable Z.
每项实验研究的有效性都受到实验和实施设置之间差异的挑战。除非对变化进行局部化和识别,否则在单一环境中训练的机器不可能在环境条件发生变化时表现良好。机器学习研究人员已经充分认识到这个问题及其各种表现形式,诸如“领域适应”、“迁移学习”、“终身学习”和“可解释的人工智能”等企业只是研究人员和资助机构为缓解鲁棒性这一普遍问题而确定的一些子任务。不幸的是,鲁棒性问题需要环境的因果模型,无法在关联层面上处理,而大多数补救措施都是在关联层面上尝试的。关联不足以识别受发生的变化影响的机制。上面讨论的do演算现在提供了一种完整的方法来克服由于环境变化而导致的偏差。它既可以用于重新调整学习到的策略以规避环境变化,也可以用于控制由于非代表性样本而导致的偏差(Bareinboim 和 Pearl,2016 年)。
The validity of every experimental study is challenged by disparities between the experimental and implementational setups. A machine trained in one environment cannot be expected to perform well when environmental conditions change, unless the changes are localized and identified. This problem, and its various manifestations are well recognized by machine-learning researchers, and enterprises such as “domain adaptation,” “transfer learning,” “life-long learning,” and “explainable AI,” are just some of the subtasks identified by researchers and funding agencies in an attempt to alleviate the general problem of robustness. Unfortunately, the problem of robustness requires a causal model of the environment, and cannot be handled at the level of Association, in which most remedies were tried. Associations are not sufficient for identifying the mechanisms affected by changes that occurred. The do-calculus discussed above now offers a complete methodology for overcoming bias due to environmental changes. It can be used both for re-adjusting learned policies to circumvent environmental changes and for controlling bias due to non-representative samples (Bareinboim and Pearl, 2016).
缺失数据问题困扰着实验科学的每一个分支。受访者不会回答问卷上的每一项,传感器会随着环境条件的变化而失效,患者经常会因为不明原因而退出临床研究。关于这个问题的大量文献都与模型盲统计分析范式有关,因此,它严重局限于缺失随机发生的情况,即与模型中其他变量取值无关的情况。使用缺失过程的因果模型,我们现在可以形式化从不完整数据中恢复因果关系和概率关系的条件,并且只要条件得到满足,就可以对所需关系产生一致的估计(Mohan 和 Pearl,2017 年)。
Problems of missing data plague every branch of experimental science. Respondents do not answer every item on a questionnaire, sensors fade as environmental conditions change, and patients often drop from a clinical study for unknown reasons. The rich literature on this problem is wedded to a model-blind paradigm of statistical analysis and, accordingly, it is severely limited to situations where missingness occurs at random, that is, independent of values taken by other variables in the model. Using causal models of the missingness process we can now formalize the conditions under which causal and probabilistic relationships can be recovered from incomplete data and, whenever the conditions are satisfied, produce a consistent estimate of the desired relationship (Mohan and Pearl, 2017).
上述d分离标准使我们能够检测和列举给定因果模型的可测试含义。这开启了通过温和假设推断与数据兼容的模型集并紧凑地表示该模型集的可能性。已经开发了系统搜索,在某些情况下,它可以显著修剪兼容模型集,以至于可以直接从该集合估计因果查询(Spirtes 等人,2000 年;Pearl,2000 年;Peters 等人,2017 年)。
The d-separation criterion described above enables us to detect and enumerate the testable implications of a given causal model. This opens the possibility of inferring, with mild assumptions, the set of models that are compatible with the data, and to represent this set compactly. Systematic searches have been developed which, in certain circumstances, can prune the set of compatible models significantly to the point where causal queries can be estimated directly from that set (Spirtes et al., 2000; Pearl, 2000; Peters et al., 2017).
哲学家斯蒂芬·图尔敏 (1961) 认为,基于模型与模型盲目的二分法是理解古巴比伦和希腊之间古代竞争的关键科学。根据图尔敏的说法,巴比伦天文学家是黑箱预测的大师,在准确性和一致性方面远远超过了他们的希腊对手(图尔敏,1961 年,第 27-30 页)。然而,科学偏爱希腊天文学家的创造性推测策略,这种策略充满了形而上学的意象:充满火焰的圆形管子、透过小孔可以看到天火的星星,以及骑在龟背上的半球形地球。然而,正是这种疯狂的建模策略,而不是巴比伦的僵化,让埃拉托色尼 (276-194 BC) 进行了古代世界上最具创造性的实验之一,并测量了地球的半径。巴比伦的曲线拟合师永远不会想到这一点。
The philosopher Stephen Toulmin (1961) identifies model-based vs. model-blind dichotomy as the key to understanding the ancient rivalry between Babylonian and Greek science. According to Toulmin, the Babylonians astronomers were masters of black-box prediction, far surpassing their Greek rivals in accuracy and consistency (Toulmin, 1961, pp. 27–30). Yet Science favored the creative-speculative strategy of the Greek astronomers which was wild with metaphysical imagery: circular tubes full of fire, small holes through which celestial fire was visible as stars, and hemispherical earth riding on turtle backs. Yet it was this wild modeling strategy, not Babylonian rigidity, that jolted Eratosthenes (276-194 BC) to perform one of the most creative experiments in the ancient world and measure the radius of the earth. This would never have occurred to a Babylonian curve-fitter.
回到强人工智能,我们已经看到,模型盲方法在它们能够执行的认知任务方面存在内在限制。我们描述了其中一些任务,并展示了如何在 SCM 框架中完成它们,以及为什么基于模型的方法对于执行这些任务至关重要。我们的总体结论是,人类级别的人工智能不能仅从模型盲学习机器中产生;它需要数据和模型的共生协作。
Coming back to strong AI, we have seen that model-blind approaches have intrinsic limitations on the cognitive tasks that they can perform. We have described some of these tasks and demonstrated how they can be accomplished in the SCM framework, and why a model-based approach is essential for performing these tasks. Our general conclusion is that human-level AI cannot emerge solely from model-blind learning machines; it requires the symbiotic collaboration of data and models.
数据科学之所以是一门科学,是因为它有助于解释数据——一个将数据与现实联系起来的二体问题。无论数据有多大,无论操纵得多么巧妙,数据本身都很难成为一门科学。
Data science is only as much of a science as it facilitates the interpretation of data – a two-body problem, connecting data to reality. Data alone are hardly a science, regardless how big they get and how skillfully they are manipulated.
1.这一层推理的其他名称有:“无模型”、“模型盲”、“黑盒”或“以数据为中心”。Darwiche (2018) 使用“函数拟合”,因为它相当于通过神经网络架构定义的复杂函数来拟合数据。
1. Other names used for inferences at this layer are: “model-free,” “model-blind,” “black-box,” or “data-centric.” Darwiche (2018) used “function-fitting,” for it amounts to fitting data by a complex function defined by the neural network architecture.
2.图中编码的假设通过其缺失的箭头传达。例如,Y不影响X或Z,X不影响Z,最重要的是,Z是唯一影响X和Y 的变量。这些假设缺乏可测试的含义,可以从图是完整的(即没有缺失边)这一事实中得出结论。
2. The assumptions encoded in the graph are conveyed by its missing arrows. For example, Y does not influence X or Z, X does not influence Z and, most importantly, Z is the only variable affecting both X and Y. That these assumptions lack testable implications can be concluded from the fact that the graph is complete, i.e., no edges are missing.
3.例如,经济学家选择了代数表示而不是图形表示,从而失去了基本的可测试性检测功能(Pearl,2015b)。
3. Economists, for example, having chosen algebraic over graphical representations, are deprived of elementary testability-detecting features (Pearl, 2015b).
长期以来,生物学和人工智能 (AI) 之间一直存在着富有成效的交流,尤其是神经科学引领了这一交流。至少自从麦卡洛克和皮茨 (1943) 假设了神经元的简化模型及其活动的“逻辑演算”以来,神经网络就被提出作为大脑的描述模型和人工智能的基本构建模块。计算和心理学之间的相互作用至今仍然非常活跃。事实上,计算神经科学现在本身就是一个蓬勃发展的领域。
There has long been a fruitful interchange between biology and artificial intelligence (AI), with neuroscience in particular leading the way. At least since McCulloch and Pitts (1943) postulated both a simplified model of the neuron and an accompanying “logical calculus” for its activity, neural networks have been proposed to shed light both as descriptive models of the brain and as basic building blocks for AI. The interplay between computation and psychology is still very much alive today. Indeed, computational neuroscience is now a thriving field of its own.
在他的经典著作《心智架构:联结主义方法》(第 19 章)中,David Rumelhart 论证了神经网络在理解认知方面的实用性。他展示了(以今天的标准来看)相当原始的并行分布式处理 (PDP) 网络的强大功能,这是神经启发式认知方法的最高水准。我们将其收录在这里,既是因为它清晰地阐述了基础知识,也是因为它捕捉到了联结主义方法的早期热情。
In his classic “The Architecture of Mind: A Connectionist Approach” (chapter 19), David Rumelhart makes a case for the utility of neural networks in understanding cognition. His demonstration of the power of what are (by today’s standards) quite primitive parallel distributed processing (PDP) networks was a high-water mark for neurally inspired approaches to cognition. We include it here both for its clear exposition of the fundamentals and the way that it captures the early enthusiasm for the connectionist approach.
在《计算大脑》 (1992 年)的摘录中,丘奇兰德和塞诺夫斯基举例说明了另一个方向(第 20 章),展示了计算方法如何对理解大脑大有裨益。在这部关于神经科学哲学的开创性著作中,他们将抽象的计算语言与大脑运作方式相对复杂的细节联系起来。他们还提出了关于大脑功能的基本假设,例如,大脑机制通常跨越多个组织层次,可以在不同的(马里安)分析层次上进行描述。
In excerpts from The Computational Brain (1992), Churchland and Sejnowski exemplify the other direction (in chapter 20), showing how a computational approach has been fruitful for understanding the brain. In this seminal work on the philosophy of neuroscience, they connect the abstract language of computation with the comparatively messy details of how brains work. They also lay out fundamental assumptions about brain function, such as the idea that brain mechanisms typically span multiple levels of organization and can be described at different (Marrian) levels of analysis.
Cowie 和 Woodward 的《心智并非(仅仅)由自然选择塑造的模块系统》(第 21 章)探讨了是否有充分理由相信心智主要由因果独立且信息封装的模块组成。在此过程中,他们将模块化问题与进化心理学在理解心智中的正确作用的考虑结合在一起。当然,进化是最初的心智设计者——但如何理解这种关系,更不用说它如何影响我们对人工系统的理解,仍然是一个备受争议的话题。也许更核心的问题是,心智是否可能只是相对简单的领域特定模块的集合,这与可能实现类人智能的人工系统类型直接相关。
Cowie and Woodward’s “The Mind Is Not (Just) a System of Modules Shaped (Just) by Natural Selection” (chapter 21) considers whether there are good reasons to believe that the mind is composed of largely causally independent and informationally encapsulated modules. In doing so, they bring together issues of modularity with considerations about the correct role for evolutionary psychology in the understanding of the mind. Evolution is, of course, the original Mind Designer—but how to understand that relation, much less how it might affect our understanding of artificial systems, remains a hotly contested topic. Perhaps more centrally, the question of whether the mind might simply be an assemblage of relatively simple, domain-specific modules, has direct bearing on the kinds of artificial systems that might plausibly implement human-like intelligence.
本书这一部分的论文自然会与第四部分的论文结合在一起;先读哪一部分更多的是重点问题,而不是严格划分。关注智力的读者可能会将这些论文与第二部分的论文结合起来,询问人类智力是否有一些独特之处,这些独特之处在构成上取决于人类认知的实现方式。智力进化是为了处理特定的进化问题,这一观点也是第六部分一些论文的主题。
The papers in this part of the book are naturally combined with those in part IV; which to read first is more a matter of emphasis than of strict division. A reader focusing on intelligence might combine these papers with those in part II, asking whether there is something distinctive about human intelligence that depends constitutively on how human cognition is implemented. The idea that intelligence evolved to deal with specific evolutionary problems is also a theme of some of the papers in part VI.
历史联系。 关于神经科学和计算之间的历史联系,最近有几次有用的讨论:
Historical Connections. There have been several useful recent discussions about the historical link between neuroscience and computation:
关于联结主义。 在上一期《心智设计》中,关于联结主义架构是否为传统计算提供了强有力的替代方案的争论愈演愈烈。由于这场争论主要集中在 PDP 及其局限性上(其中许多局限性已经被超越),我们认为最好将其省略。但争论远未结束。请考虑以下几点:
On Connectionism. In the last edition of Mind Design, a debate raged about whether connectionist architectures provide a strong alternative to classical computation. As much of this debate was focused on PDP and its limitations—many of which have since been transcended—we felt it was best to elide them. But the debates are far from settled. Consider the following:
我们还删除了另一条讨论线索,尽管在我们看来这条线索非常有活力,即关于心智的计算主义模型应该如何影响我们对心智的看法和理论化。考虑以下内容:
We also removed a different line of discussion, though it seems to us very much alive, about how computationalist models of mind should affect how we think of minds and theorizing about them. Consider the following:
心智和大脑的模块化。 关于模块化的争论主要集中在人们最初认为模块是什么。经典来源是 Fodor (1983) 的开篇(后面的部分摘录为本卷第 9 章),其中列出了一个相当严格的模块化标准列表。许多作者放宽了这些假设,通常是为了捍卫“大规模模块化”假设:
Modularity of Mind and Brain. Much of the debate around modularity turns on what one thinks modules are in the first place. The classic source is the beginning of Fodor (1983) (later sections of which are excerpted as chapter 9 of this volume), which presents a fairly rigid list of criteria for modularity. Various authors relax these assumptions, usually in the service of defending a “massive modularity” hypothesis:
功能和分解。 在讨论模块化时,一个微妙的问题是模块是系统的功能部分还是机械的、时空可分离的部分。神经科学中关于模块化的许多讨论都假设两者都是,但很明显,两者是分开的。考虑一下化油器的常用功能主义例子。化油器有一个功能步骤——即将汽油雾化并与适当比例的空气混合以燃烧。有些汽车也有化油器:特定部件,其功能是化油。然而,今天你很难找到一辆带化油器的汽车:燃油喷射和环境控制的进步意味着化油器是由几个部件的共同作用完成的,其中大多数部件还具有其他功能。因此,在询问模块化时,总是值得问一下分解的类型。考虑将以下内容添加到您的阅读材料中:
Function and Decomposition. When debating modularity, one subtle issue has to do with whether modules are functional parts of systems or mechanical, spatiotemporally isolable parts as well. Many discussions of modularity in neuroscience assume that both must be the case, but it’s clear that the two come apart. Consider the stock functionalist example of the carburetor. There is a functional step of carburetion—that is, of atomizing gasoline and mixing it with an appropriate portion of air to be burned. Some cars also have carburetors: specific parts, the function of which is to carburate. However, you’d be hard-pressed to find a car with a carburetor today: advances in fuel injection and environmental controls meant that carburetion is done by the joint action of several parts, most of which serve other functions as well. When asking about modularity, therefore, it’s always worth asking what sort of decomposition one means. Consider adding the following to your reading:
戴维·E·鲁梅哈特
David E. Rumelhart
1989
1989
认知科学与计算机有着长期而重要的关系。计算机为我们提供了一种工具,让我们能够表达我们的心理活动理论;它是一种宝贵的隐喻来源,通过它,我们逐渐理解和领悟到心理活动如何从简单组件处理元素的运作中产生。
Cognitive science has a long-standing and important relationship to the computer. The computer has provided a tool whereby we have been able to express our theories of mental activity; it has been a valuable source of metaphors through which we have come to understand and appreciate how mental activities might arise out of the operations of simple-component processing elements.
我清楚地记得,大约十五年前,我曾教过一堂课,当时我概述了当时对认知系统的看法。一位特别持怀疑态度的学生对我的论述提出了质疑,因为我的论述依赖于计算机科学和人工智能中的概念,他问我,如果我们的计算机是并行的而不是串行的,我的理论是否会有所不同。我记得,我的回答是承认我们的理论可能会有很大的不同,但我认为这并不是一件坏事。我指出,我们理论的灵感和我们对抽象现象的理解总是基于我们对当时技术的经验。我指出,亚里士多德有一个蜡板记忆理论,莱布尼茨认为宇宙是钟表,弗洛伊德使用了一个流经系统的力比多的液压模型,电话总机智能模型也发挥了重要作用。我认为,前几代人提出的理论尽管是基于他们那个时代的隐喻,但仍然很有用。因此,我认为,在我们这一代——串行计算机的一代——我们理所当然地应该从与我们这个时代最先进的技术发展类比中汲取见解。我现在不记得我的回答是否让学生满意,但我毫不怀疑,我们在认知科学领域通过使用从计算机经验中得出的概念获得了许多价值。
I recall vividly a class I taught some fifteen years ago in which I outlined the then-current view of the cognitive system. A particularly skeptical student challenged my account, with its reliance on concepts drawn from computer science and artificial intelligence, with the question of whether I thought my theories would be different if it had happened that our computers were parallel instead of serial. My response, as I recall, was to concede that our theories might very well be different, but to argue that that wasn’t a bad thing. I pointed out that the inspiration for our theories and our understanding of abstract phenomena always is based on our experience with the technology of the time. I pointed out that Aristotle had a wax tablet theory of memory, that Leibniz saw the universe as clockworks, that Freud used a hydraulic model of libido flowing through the system, and that the telephone-switchboard model of intelligence had played an important role as well. The theories posited by those of previous generations had, I suggested, been useful in spite of the fact that they were based on the metaphors of their time. Therefore, I argued, it was natural that in our generation—the generation of the serial computer—we should draw our insights from analogies with the most advanced technological developments of our time. I don’t now remember whether my response satisfied the student, but I have no doubt that we in cognitive science have gained much of value through our use of concepts drawn from our experience with the computer.
除了作为隐喻来源的价值之外,计算机与早期技术还有另一个显著的不同。计算机可以模拟与运行这些模拟的计算机操作截然不同的系统。这样,我们就可以使用计算机来模拟我们希望体验的系统,从而提供经验来源,这些经验可以为我们提供新的隐喻和关于如何完成心理操作的新见解。联结学派所采用的就是这种计算机用法。我们正在探索的架构不是基于当前一代计算机的冯·诺依曼架构,而是基于对大脑本身如何运作的考虑。因此,我们的策略变成了提供大脑计算架构的通用抽象模型,开发适合该架构的算法和程序,在计算机上模拟这些程序和架构,并将它们作为有关人类信息处理系统性质的假设进行探索。我们说这样的模型是受神经启发的,我们将这种系统的计算称为大脑式计算。简而言之,我们的目标是用大脑隐喻取代计算机隐喻。
In addition to its value as a source of metaphors, the computer differs from earlier technologies in another remarkable way. The computer can be made to simulate systems whose operations are very different from the computers on which these simulations run. In this way we can use the computer to simulate systems with which we wish to have experience and thereby provide a source of experience that can be drawn upon in giving us new metaphors and new insights into how mental operations might be accomplished. It is this use of the computer that the connectionists have employed. The architecture that we are exploring is not one based on the von Neumann architecture of our current generation of computers but rather an architecture based on considerations of how brains themselves might function. Our strategy has thus become one of offering a general and abstract model of the computational architecture of brains, to develop algorithms and procedures well suited to this architecture, to simulate these procedures and architecture on a computer, and to explore them as hypotheses about the nature of the human information-processing system. We say that such models are neurally inspired, and we call computation on such a system brain-style computation. Our goal in short is to replace the computer metaphor with the brain metaphor.
为什么脑式计算机会成为特别有趣的灵感来源?采用计算机隐喻隐含着对认知科学中适当的解释水平的假设。基本假设是,我们应该在程序或功能层面而不是实现层面寻求解释。因此,人们经常指出,通过观察电子设备,我们几乎无法了解特定计算机可能正在运行哪种程序。事实上,我们根本不关心计算机的细节;我们关心的只是它正在运行的特定程序。如果我们了解程序,我们就知道系统在任何情况下会如何表现。无论我们使用真空管还是晶体管,无论我们使用 IBM 还是 Apple,其基本特征都是相同的。这是一个非常误导性的类比。对于计算机来说,这是正确的,因为它们本质上都是一样的。无论我们用真空管还是晶体管制造它们,无论我们使用 IBM 还是 Apple 计算机,我们都在使用相同总体设计的计算机。但是,当我们观察本质上不同的架构时,我们会发现架构会带来很大的不同。正是这种结构决定了哪种算法最容易在机器上运行。正是机器的结构决定了程序本身的本质。因此,我们首先应该问自己对大脑结构了解多少,以及大脑结构如何塑造生物智能和人类精神生活背后的算法,这是合理的。
Why should a brain-style computer be an especially interesting source of inspiration? Implicit in the adoption of the computer metaphor is an assumption about the appropriate level of explanation in cognitive science. The basic assumption is that we should seek explanation at the program or functional level rather than the implementation level. Thus, it is often pointed out that we can learn very little about what kind of program a particular computer may be running by looking at the electronics. In fact we don’t care much about the details of the computer at all; all we care about is the particular program it is running. If we know the program, we know how the system will behave in any situation. It doesn’t matter whether we use vacuum tubes or transistors, whether we use an IBM or an Apple, the essential characteristics are the same. This is a very misleading analogy. It is true for computers because they are all essentially the same. Whether we make them out of vacuum tubes or transistors, and whether we use an IBM or an Apple computer, we are using computers of the same general design. But, when we look at an essentially different architecture, we see that the architecture makes a good deal of difference. It is the architecture that determines which kinds of algorithms are most easily carried out on the machine in question. It is the architecture of the machine that determines the essential nature of the program itself. It is thus reasonable that we should begin by asking what we know about the architecture of the brain and how it might shape the algorithms underlying biological intelligence and human mental life.
联结主义方法的基本策略是将类似于抽象神经元的东西作为其基本处理单元。我们设想计算是通过这些处理单元之间的简单交互进行的。本质上,这个想法是这些处理单元通过沿着连接处理单元的线路发送数字来进行通信。这种识别已经为可能构成人类智能的算法类型提供了一些有趣的限制。
The basic strategy of the connectionist approach is to take as its fundamental processing unit something close to an abstract neuron. We imagine that computation is carried out through simple interactions among such processing units. Essentially the idea is that these processing elements communicate by sending numbers along the lines that connect the processing elements. This identification already provides some interesting constraints on the kinds of algorithms that might underlie human intelligence.
那么,我们模型中的操作可以最好地被描述为“神经启发的”。用大脑隐喻代替计算机隐喻作为心智模型会如何影响我们的思维?这种方向的改变使我们考虑了许多因素,这些因素进一步指导和限制了我们的模型构建工作。其中最重要的因素可能是时间。与现代计算机中的组件相比,神经元的速度非常慢。神经元在毫秒的时间尺度上运作,而计算机组件在纳秒的时间尺度上运作——快了 10 6倍。这意味着需要一秒或更短时间的人类过程可能只涉及一百个左右的时间步骤。因为我们研究的大多数过程——感知、记忆检索、语音处理、句子理解等——大约需要一秒钟左右,所以施加 Feldman (1985) 所说的“100 步程序”约束是有意义的。也就是说,我们寻求对这些心理现象的解释,而这些解释不需要超过一百个基本顺序操作。鉴于我们试图描述的进程通常非常复杂,可能涉及考虑大量同时出现的约束,我们的算法必须涉及相当大的并行性。因此,尽管可以用我们的单元所代表的组件类型创建串行计算机,但对于除最简单进程之外的任何进程,这样的实现肯定会违反 100 步程序约束。有些人可能会争辩说,虽然并行性显然存在于人类信息处理的很多方面,但仅凭这一事实并不会极大地改变我们的世界观。这不太可能。组件的速度是一个关键的设计约束。虽然大脑的组件很慢,但数量非常多。人脑包含数十亿个这样的处理元素。与我们的那些步骤非常快的系统不同,大脑不是用许多许多串行步骤来组织计算,而是必须协同和并行地部署许多许多处理元素来执行其活动。我相信,这些设计特征以及其他特征导致了一种与我们习惯的计算组织方式根本不同的总体计算组织方式。
The operations in our models then can best be characterized as “neurally-inspired”. How does the replacement of the computer metaphor with the brain metaphor as model of mind affect our thinking? This change in orientation leads us to a number of considerations that further inform and constrain our model-building efforts. Perhaps the most crucial of these is time. Neurons are remarkably slow relative to components in modern computers. Neurons operate in the time scale of milliseconds, whereas computer components operate in the time scale of nanoseconds—a factor of 106 faster. This means that human processes that take on the order of a second or less can involve only a hundred or so time steps. Because most of the processes we have studied—perception, memory retrieval, speech processing, sentence comprehension, and the like—take about a second or so, it makes sense to impose what Feldman (1985) calls the “100-step-program” constraint. That is, we seek explanations for these mental phenomena that do not require more than about a hundred elementary sequential operations. Given that the processes we seek to characterize are often quite complex and may involve consideration of large numbers of simultaneous constraints, our algorithms must involve considerable parallelism. Thus although a serial computer could be created out of the kinds of components represented by our units, such an implementation would surely violate the 100-step-program constraint for any but the simplest processes. Some might argue that, although parallelism is obviously present in much of human information processing, this fact alone need not greatly modify our world view. This is unlikely. The speed of components is a critical design constraint. Although the brain has slow components, it has very many of them. The human brain contains billions of such processing elements. Rather than organize computation with many, many serial steps, as we do with systems whose steps are very fast, the brain must deploy many, many processing elements cooperatively and in parallel to carry out its activities. These design characteristics, among others, lead, I believe, to a general organization of computing that is fundamentally different from what we are used to.
进一步的考虑将我们的模型与受计算机隐喻启发的模型区分开来,即所有知识都存在于连接中。在传统的可编程计算机中,我们习惯于将知识视为存储在系统中某些单元的状态中。在我们的系统中,我们假设单元状态中只能发生非常短期的存储;长期存储发生在单元之间的连接中。事实上,连接——或者可能是通过经验形成连接的规则——是区分一个模型与另一个模型的主要因素。这是我们的方法与其他更传统的方法之间的一个深刻区别,因为这意味着几乎所有的知识都隐含在执行任务的设备的结构中,而不是显式地存在于单元本身的状态中。知识不能直接由某个单独的处理器进行解释,而是内置于处理器本身并直接决定处理过程。它是通过调整连接获得的,因为它们用于处理,而不是被表述和存储为陈述性事实。
A further consideration differentiates our models from those inspired by the computer metaphor—that is, the constraint that all the knowledge is in the connections. From conventional programmable computers we are used to thinking of knowledge as being stored in the states of certain units in the system. In our systems we assume that only very short-term storage can occur in the states of units; long-term storage takes place in the connections among units. Indeed it is the connections—or perhaps the rules for forming them through experience—that primarily differentiate one model from another. This is a profound difference between our approach and other more conventional approaches, for it means that almost all knowledge is implicit in the structure of the device that carries out the task, rather than explicit in the states of units themselves. Knowledge is not directly accessible to interpretation by some separate processor, but it is built into the processor itself and directly determines the course of processing. It is acquired through tuning of connections, as they are used in processing, rather than formulated and stored as declarative facts.
这些和其他受神经启发的工作假设是联结主义研究计划的一个重要假设来源。但这并不是唯一的考虑因素。第二类约束来自我们对人类信息处理性质的信念,这种信念是在更抽象的计算分析层面上考虑的。我们将所研究的现象视为一种约束满足程序的产物,其中大量约束同时起作用以产生行为。因此,我们认为大多数行为不是认知系统单个独立组件的产物,而是一大组相互作用组件的产物,每个组件都相互制约,并以自己的方式对系统的全局可观察行为做出贡献。使用串行算法来实现这种概念非常困难,但使用高度并行的算法却非常自然。这些问题通常可以描述为最佳匹配或优化问题。正如明斯基和帕普特 (1969) 指出的那样,串行解决最佳匹配问题非常困难。然而,这正是我们一直在研究的那种可以通过高度并行算法轻松解决的问题。
These and other neurally inspired classes of working assumptions have been one important source of assumptions underlying the connectionist program of research. These have not been the only considerations. A second class of constraints arises from our beliefs about the nature of human information processing considered at a more abstract, computational level of analysis. We see the kinds of phenomena we have been studying as products of a kind of constraint-satisfaction procedure in which a very large number of constraints act simultaneously to produce the behavior. Thus we see most behavior not as the product of a single, separate component of the cognitive system but as the product of a large set of interacting components, each mutually constraining the others and contributing in its own way to the globally observable behavior of the system. It is very difficult to use serial algorithms to implement such a conception but very natural to use highly parallel ones. These problems can often be characterized as best-match or optimization problems. As Minsky and Papert (1969) have pointed out, it is very difficult to solve best-match problems serially. This is precisely the kind of problem, however, that is readily implemented using highly parallel algorithms of the kind we have been studying.
因此,使用类脑计算系统不仅让我们有希望描述大脑如何执行某些信息处理任务,而且还能解决在更传统的计算框架中似乎难以解决的计算问题。正是在这里,联结系统的最终价值必须得到评估。
The use of brain-style computational systems, then, offers not only a hope that we can characterize how brains actually carry out certain information-processing tasks but also solutions to computational problems that seem difficult to solve in more traditional computational frameworks. It is here where the ultimate value of connectionist systems must be evaluated.
在本章中,我首先会较为正式地概述联结主义模型的计算框架。然后,我会对联结主义模型最适合的计算问题类型进行一般性讨论。最后,我会简要回顾联结主义建模的最新进展。
In this chapter, I begin with a somewhat more formal sketch of the computational framework of connectionist models. I then follow with a general discussion of the kinds of computational problems that connectionist models seem best suited for. Finally, I briefly review the state of the art in connectionist modeling.
任何联结主义系统都有七个主要组成部分:
There are seven major components of any connectionist system:
图 19.1说明了这些系统的基本方面。有一组处理单元,在我的图中通常用圆圈表示;在每个时间点,每个单元u i都有一个激活值,在图中表示为a i ( t );该激活值通过函数f i;产生输出值o i ( t )。该输出值可看作是通过一组单向连接(在图中用线或箭头表示)传递到系统中的其他单元。每个连接都与一个实数相关联,通常称为连接的权重或强度,表示为w ij(到单元i,来自单元j),它决定前者受后者的影响程度。然后必须组合所有输入;对一个单元的组合输入(通常指定为该单元的净输入)与其当前激活值一起,通过函数F确定其新的激活值。这些系统被视为具有可塑性,因为互连模式不是一直固定的;相反,权重可以根据经验进行修改。这样系统就可以进化。单位所代表的内容可以随着经验而改变,系统可以以截然不同的方式运行。
Figure 19.1 illustrates the basic aspects of these systems. There is a set of processing units, generally indicated by circles in my diagrams; at each point in time each unit ui has an activation value, denoted in the diagram as ai(t); this activation value is passed through a function fi; to produce an output value oi(t). This output value can be seen as passing through a set of unidirectional connections (indicated by lines or arrows in the diagrams) to other units in the system. There is associated with each connection a real number, usually called the weight or strength of the connection, designated wij (to unit i, from unit j), which determines how strongly the former is affected by the latter. All of the inputs must then be combined; and the combined inputs to a unit (usually designated the net input to that unit), along with its current activation value, determine its new activation value via a function F. These systems are viewed as being plastic in the sense that the pattern of interconnections is not fixed for all time; rather the weights can undergo modification as a function of experience. In this way the system can evolve. What a unit represents can change with experience, and the system can come to perform in substantially different ways.
图19.1
并行分布式处理系统的基本部分。
Figure 19.1
The basic parts of a parallel distributed processing system.
处理单元集。任何联结系统都始于一组处理单元。指定处理单元集及其所代表的内容通常是指定联结模型的第一步。在某些系统中,这些单元可能代表特定的概念对象,如特征、字母、单词或概念;在其他系统中,它们仅仅是抽象元素,可在其上定义有意义的模式。当我们谈到分布式表示时,我们的意思是其中的单元代表小的、类似于特征的实体,我们称之为微特征。在这种情况下,整个模式才是有意义的分析层次。这应该与一单元一概念或局部主义表示系统形成对比,在后者中单个单元代表整个概念或其他大型有意义的实体。
THE SET OF PROCESSING UNITS. Any connectionist system begins with a set of processing units. Specifying the set of processing units and what they represent is typically the first stage of specifying a connectionist model. In some systems these units may represent particular conceptual objects such as features, letters, words, or concepts; in others they are simply abstract elements over which meaningful patterns can be defined. When we speak of a distributed representation, we mean one in which the units represent small, feature-like entities we call microfeatures. In this case it is the pattern as a whole that is the meaningful level of analysis. This should be contrasted to a one-unit-one-concept or localist representational system, in which single units represent entire concepts or other large meaningful entities.
联结系统的所有处理都由这些单元执行。没有执行者或其他监督者。只有相对简单的单元,每个单元都做着自己相对简单的工作。单元的工作只是接收来自其邻居的输入,并根据其接收的输入计算输出值,并将其发送给其邻居。该系统本质上是并行的,因为许多单元可以同时执行计算。
All of the processing of a connectionist system is carried out by these units. There is no executive or other overseer. There are only relatively simple units, each doing its own relatively simple job. A unit’s job is simply to receive input from its neighbors and, as a function of the inputs it receives, to compute an output value, which it sends to its neighbors. The system is inherently parallel in that many units can carry out their computations at the same time.
在我们建模的任何系统中,描述三种类型的单元很有用:输入、输出和隐藏单元。输入单元从研究系统外部的源接收输入。这些输入可能是感官输入,也可能是模型所嵌入的处理系统的其他部分的输入。输出单元将信号发送出系统。它们可能直接影响运动系统,也可能仅仅影响我们正在建模的系统之外的其他系统。隐藏单元的输入和输出仅在我们正在建模的系统内。它们对外部系统不“可见”。
Within any system we are modeling, it is useful to characterize three types of units: input, output, and hidden units. Input units receive inputs from sources external to the system under study. These inputs may be either sensory inputs or inputs from other parts of the processing system in which the model is embedded. The output units send signals out of the system. They may either directly affect motoric systems or simply influence other systems external to the ones we are modeling. The hidden units are those whose only inputs and outputs are within the system we are modeling. They are not “visible” to outside systems.
激活状态。除了一组单元之外,我们还需要系统在时间t 时的状态表示。这主要由向量a ( t ) 指定,表示处理单元集上的激活模式。向量的每个元素代表其中一个单元的激活。整个单元集上的激活模式可以捕捉系统在任何时间所表示的内容。将系统中的处理视为单元集上的活动模式随时间的演变很有用。
THE STATE OF ACTIVATION. In addition to the set of units we need a representation of the state of the system at time t. This is primarily specified by a vector a(t), representing the pattern of activation over the set of processing units. Each element of the vector stands for the activation of one of the units. It is the pattern of activation over the whole set of units that captures what the system is representing at any time. It is useful to see processing in the system as the evolution, through time, of a pattern of activity over the set of units.
不同的模型对于单元可以采用的激活值做出不同的假设。激活值可以是连续的,也可以是离散的。如果是连续的,则它们可以是无界的,也可以是有界的。如果是离散的,则它们可以采用二进制值或一小组值中的任意一个。因此,在某些模型中,单元是连续的,并且可以采用任何实数作为激活值。在其他情况下,它们可以采用某个最小值和最大值之间的任何实值,例如区间 [0,1]。当激活值被限制为离散值时,它们通常是二进制的 - 例如值 0 和 1,其中 1 通常表示单元处于活动状态,而 0 表示单元处于非活动状态。
Different models make different assumptions about the activation values a unit is allowed to take on. Activation values may be continuous or discrete. If they are continuous, they may be unbounded or bounded. If they are discrete, they may take binary values or any of a small set of values. Thus in some models units are continuous and may take on any real number as an activation value. In other cases they may take on any real value between some minimum and maximum such as, for example, the interval [0,1]. When activation values are restricted to discrete values, they most often are binary—such as the values 0 and 1, where 1 is usually taken to mean that the unit is active and 0 is taken to mean that it is inactive.
输出函数。单元通过向其邻居传输信号进行交互。其信号强度(以及它们对邻居的影响程度)由其激活水平决定。每个单元 ui 都与一个输出函数 f i ( a i ( t )) 相关联,它将当前的激活状态映射到输出信号o i ( t )。在我们的某些模型中,输出水平恰好等于单元的激活水平。在这种情况下,f是恒等函数f ( x ) = x。有时f是某种阈值函数,因此一个单元只有在其激活水平超过某个值时才会对另一个单元产生影响。有时函数f被假定为一个随机函数,其中单元的输出概率地取决于其激活水平。
THE OUTPUT FUNCTION. Units interact by transmitting signals to their neighbors. The strengths of their signals, and therefore the degrees to which they affect their neighbors, are determined by their levels of activation. Associated with each unit ui is an output function fi(ai(t)), which maps the current state of activation to an output signal oi(t). In some of our models, the output level is exactly equal to the activation level of the unit. In this case, f is the identity function f(x) = x. Sometimes f is some sort of threshold function, so that a unit has no effect on another unit unless its activation exceeds a certain value. Sometimes the function f is assumed to be a stochastic function in which the output of the unit depends probabilistically on its activation level.
连接模式。单元彼此连接。这种连接模式构成了系统所知道的内容,并决定了系统如何响应任意输入。在联结主义模型中,指定处理系统及其编码的知识就是指定处理单元之间的这种连接模式。
THE PATTERN OF CONNECTIVITY. Units are connected to one another. It is this pattern of connectivity that constitutes what the system knows and determines how it will respond to any arbitrary input. Specifying the processing system and the knowledge encoded therein is, in a connectionist model, a matter of specifying this pattern of connectivity among the processing units.
在很多情况下,我们假设每个单元都为其所连接的单元的输入提供附加贡献。在这种情况下,任何单元的总输入都只是与其连接的每个单元的单独输入的加权和。也就是说,所有传入单元的输入都简单地乘以它们各自的连接权重,然后相加即可得到该单元的总输入。在这种情况下,只需指定系统中每个连接的权重即可表示总体连接模式。正权重代表兴奋性输入,负权重代表抑制性输入。通常用权重矩阵W来表示这种连接模式很方便,其中条目w ij代表单元u j到单元u i的连接强度和意义。如果单元u j兴奋单元u i ,则权重w ij为正数;如果单元u j抑制单元u i ,则权重 w ij为负数;如果单元u j与单元u i没有直接连接,则权重 w ij 为 0 。w ij的绝对值指定了连接的强度。
In many cases we assume that each unit provides an additive contribution to the input of the units to which it is connected. In such cases the total input to any unit is simply the weighted sum of the separate inputs from each of the units connected to it. That is, the inputs from all of the incoming units are simply multiplied by their respective connection weights and summed to get the overall input to that unit. In this case the total pattern of connectivity can be represented by merely specifying the weights for each of the connections in the system. A positive weight represents an excitatory input, and a negative weight represents an inhibitory input. It is often convenient to represent such a pattern of connectivity by a weight matrix W in which the entry wij represents the strength and sense of the connection to unit ui from unit uj. The weight wij is a positive number if unit uj excites unit ui; it is a negative number if unit uj inhibits unit ui; and it is 0 if unit uj has no direct connection to unit ui. The absolute value of wij specifies the strength of the connection.
连接模式非常重要。正是这种模式决定了每个单元代表什么。一个重要的问题是单元的扇入和扇出,它可能会决定网络可以存储多少信息以及必须执行多少串行处理。扇入是激发或抑制给定单元的元素数量。扇出是受单元直接影响的单元数量。值得注意的是,在大脑中这些数字相对较大。在大脑的某些部分,扇入和扇出的范围高达 100,000。这种大的扇入和扇出似乎可以实现一种不太像固定电路而更具统计特性的操作。
The pattern of connectivity is very important. It is this pattern that determines what each unit represents. One important issue that may determine both how much information can be stored and how much serial processing the network must perform is the fan-in and fan-out of a unit The fan-in is the number of elements that either excite or inhibit a given unit. The fan-out is the number of units affected directly by a unit. It is useful to note that in brains these numbers are relatively large. Fan-in and fan-out range as high as 100,000 in some parts of the brain. It seems likely that this large fan-in and fan-out allows for a kind of operation that is less like a fixed circuit and more statistical in character.
激活规则。我们还需要一个规则,根据该规则,作用于特定单元的输入彼此组合并与单元的当前状态相结合以产生新的激活状态。我们需要一个函数F i,它接受a i ( t ) 和净输入,并产生一个新的激活状态。在最简单的情况下,当F i是恒等函数且仅取决于输入时,我们可以写出a i ( t + 1) = net i ( t )——或者,以矢量表示法一次性表示整个网络,a ( t + 1) = net ( t ) = Wo ( t )。有时F是一个阈值函数,因此净输入必须超过某个值才会对新的激活状态产生贡献。通常,新的激活状态取决于旧激活状态以及当前输入。函数F本身就是我们所说的激活规则。通常假设该函数是确定性的。因此,例如,如果涉及阈值,则当总输入超过某个阈值时,a i ( t ) = 1,否则等于 0。其他时候,假设F是随机的。有时假设激活会随时间缓慢衰减,因此即使没有外部输入,单元的激活也会衰减而不是直接变为零。每当假设a i ( t ) 取连续值时,通常假设F是一种 S 形函数。在这种情况下,单个单元可以饱和并达到最小或最大激活值。
THE ACTIVATION RULE. We also need a rule whereby the inputs impinging on a particular unit are combined with one another and with the current state of the unit to produce a new state of activation. We need a function Fi, which takes ai(t) and the net inputs, , and produces a new state of activation. In the simplest cases, when Fi is the identity function and depends only on the inputs, we can write ai(t + 1) = neti(t)—or, in vector notation for the whole network at once, a(t + 1) = net(t) = Wo(t). Sometimes F is a threshold function so that the net input must exceed some value before contributing to the new state of activation. Often the new state of activation depends on the old one as well as the current input. The function F itself is what we call the activation rule. Usually the function is assumed to be deterministic. Thus, for example, if a threshold is involved, it may be that ai(t) = 1 if the total input exceeds some threshold value, and equals 0 otherwise. Other times it is assumed that F is stochastic. Sometimes activations are assumed to decay slowly with time so that even with no external input the activation of a unit will simply decay and not go directly to zero. Whenever ai(t) is assumed to take on continuous values, it is common to assume that F is a kind of sigmoid (that is, S-shaped) function. In this case an individual unit can saturate and reach a minimum or maximum value of activation.
学习规则:根据经验而变化。改变联结系统中的处理或知识结构涉及修改互联模式。原则上,这可以涉及三种修改:
THE LEARNING RULE: CHANGES AS A FUNCTION OF EXPERIENCE. Changing the processing or knowledge structure in a connectionist system involves modifying the patterns of interconnectivity. In principle this can involve three kinds of modification:
(1)发展新的联系;
(1) development of new connections;
(2)失去现有的连接;
(2) loss of existing connections;
(3)修改已经存在的连接强度。
(3) modification of the strengths of connections that already exist.
关于 (1) 和 (2) 的研究很少。然而,对于一阶近似,(1) 和 (2) 可以被视为 (3) 的特例。每当我们将连接强度从零改为某个正值或负值时,其效果与建立新连接相同。每当我们将连接强度改为零时,其效果与失去现有连接相同。因此,我们专注于通过经验修改连接强度的规则。
Very little work has been done on (1) and (2). To a first order of approximation, however, (1) and (2) can be considered a special case of (3). Whenever we change the strength of connection away from zero to some positive or negative value, it has the same effect as growing a new connection. Whenever we change the strength of a connection to zero, that has the same effect as losing an existing connection. Thus we have concentrated on rules whereby strengths of connections are modified through experience.
几乎所有此类模型的学习规则都可以看作是赫布学习规则的变体,赫布在其经典著作《行为的组织》(1949 年)中提出了这一规则。赫布的基本思想是:如果一个单元u i在两个单元都高度活跃时收到来自另一个单元u j的输入,则应加强u j对u i 的权重w ij。这一思想已被扩展和修改,因此可以更一般地表述为
Virtually all learning rules for models of this type can be considered variants of the Hebbian learning rule, suggested by Hebb in his classic book The Organization of Behavior (1949). Hebb’s basic idea is this: if a unit ui receives an input from another unit uj at a time when both units are highly active, then the weight wij to ui from uj should be strengthened. This idea has been extended and modified so that it can be stated more generally as
或者,为了便于阅读,隐藏时间变量,如下所示
or, suppressing the time variables for easier readability, as
其中τ i是u i的一种教学输入。简单地说,这个方程表示从u j到u i 的连接变化由u i的激活函数g ( … )及其教学输入τ i与u j的输出值和当前连接强度w ij的另一个函数h ( … )的乘积给出。在最简单的赫布学习版本中,没有老师,函数g和h仅与它们的第一个参数成比例。因此我们有
where τi is a kind of teaching input to ui. Simply stated, this equation says that the change in the connection to ui from uj is given by the product of a function g(…) of the activation of ui and its teaching input τi and another function h(…) of the output value of uj and the current connection strength wij. In the simplest versions of Hebbian learning, there is no teacher and the functions g and h are simply proportional to their first arguments. Thus we have
其中𝜖是代表学习率的比例常数。另一个常见的变化是规则
where 𝜖 is the constant of proportionality representing the learning rate. Another common variation is a rule in which
(就像最简单的情况一样)但是
(as in the simplest case) but
这通常被称为Widrow-Hoff 规则,因为它最初是由 Widrow 和 Hoff (1960) 制定的,或称为delta规则,因为学习量与实际实现的激活与老师提供的目标激活之间的差异(或 delta)成正比。在这种情况下,我们有
This is often called the Widrow-Hoff rule, because it was originally formulated by Widrow and Hoff (1960), or the delta rule, because the amount of learning is proportional to the difference (or delta) between the actual activation achieved and the target activation provided by a teacher. In this case we have
这是感知器学习规则的概括,著名的感知器收敛定理已得到证明。还有另一种变体
This is a generalization of the perceptron learning rule for which the famous perceptron convergence theorem has been proved. Still another variation has
这是 Grossberg (1976) 等人在竞争学习研究中采用的规则。在这种情况下,通常只有激活值最强的单元才允许学习。
This is a rule employed by Grossberg (1976) and others in the study of competitive learning. In this case usually only the units with the strongest activation values are allowed to learn.
环境。在开发任何模型时,清晰地表示模型所处的环境都是至关重要的。对于联结主义模型,我们将环境表示为可能输入模式空间上随时间变化的随机函数。也就是说,对于每种可能的输入模式,我们假设在任何给定时间,该模式都会影响输入单元。这个概率函数通常取决于系统输入和输出的历史。实际上,大多数联结主义模型涉及对环境的更简单的表征。通常,环境的特征是可能输入模式集的稳定概率分布,与系统的过去输入和过去响应无关。在这种情况下,我们可以想象列出系统的可能输入集并将它们从 1 编号到M。然后,环境的特征是一组概率p i ,其中i = l, … , M。因为每个输入模式都可以被视为一个向量,所以将具有非零概率的模式表征为构成正交或线性独立的向量集有时很有用。
THE ENVIRONMENT. It is crucial in the development of any model to have a clear representation of the environment in which this model is to exist. For connectionist models, we represent the environment as a time-varying stochastic function over a space of possible input patterns. That is, for each possible input pattern, we imagine that there is some probability that, at any given time, that pattern is impinging on the input units. This probability function may in general depend on the history of inputs to the system as well as outputs of the system. In practice most connectionist models involve a much simpler characterization of the environment. Typically, the environment is characterized by a stable probability distribution over the set of possible input patterns, independent of past inputs and past responses of the system. In this case we can imagine listing the set of possible inputs to the system and numbering them from 1 to M. The environment is then characterized by a set of probabilities pi for i = l, …, M. Because each input pattern can be considered a vector, it is sometimes useful to characterize those patterns with nonzero probabilities as constituting orthogonal or linearly independent sets of vectors.
总而言之,联结主义框架不仅包括形式语言,还包括对我们模型的视角。我们对大脑处理和人类行为的理解所产生的其他定性和定量考虑因素与形式系统相结合,形成了我们模型构建事业的美学。
To summarize, the connectionist framework consists not only of a formal language but also a perspective on our models. Other qualitative and quantitative considerations arising from our understanding of brain processing and of human behavior combine with the formal system to form what might be viewed as an aesthetic for our model-building enterprises.
联结系统除了能够利用计算中的并行性并模仿大脑式计算之外,这种系统还很重要,因为它们为认知模型中经常出现的许多非常困难的计算问题提供了良好的解决方案。特别是它们通常:
In addition to the fact that connectionist systems are capable of exploiting parallelism in computation and mimicking brain-style computation, such systems are important because they provide good solutions to a number of very difficult computational problems that seem to arise often in models of cognition. In particular they typically:
约束满足问题。许多认知科学问题被概念化为通过满足大量相互作用的约束来给出解决方案的问题。挑战在于设计一个能够有效解决此类问题的计算系统。联结网络是实现约束满足系统的理想选择;事实上,让联结网络解决难题的技巧通常是将问题转化为约束满足问题。在这种情况下,我们将联结网络概念化为约束网络,其中每个单元代表某种假设(例如,输入中存在某个语义特征、视觉特征或声学特征),每个连接代表假设之间的约束。
CONSTRAINT SATISFACTION PROBLEMS. Many cognmve-science problems are usefully conceptualized as problems in which a solution is given through the satisfaction of a very large number of mutually interacting constraints. The challenge is to devise a computational system that is capable of efficiently solving such problems. Connectionist networks are ideal for implementing constraint-satisfaction systems; indeed, the trick for getting connectionist networks to solve difficult problems is often to cast the problems as constraint-satisfaction problems. In this case, we conceptualize the connectionist network as a constraint network in which each unit represents a hypothesis of some sort (for example, that a certain semantic feature, visual feature, or acoustic feature is present in the input), and each connection represents a constraint among the hypotheses.
因此,对于这样的网络,如果预期特征 B 在特征 A 存在时也存在,那么从对应于特征 A 存在的假设的单元到表示特征 B 存在的假设的单元应该存在正向连接。相反,如果存在一个约束条件,即预期在特征 A 存在时 B不存在,那么从特征 A 到特征 B 应该存在负向连接。如果约束条件较弱,则权重应该较小;如果约束条件较强,则权重应该较大。同样,这种网络的输入也可以看作是约束条件。对特定单元的正向输入意味着有外部证据表明相关特征存在。负向输入意味着有外部证据表明该特征不存在。输入越强,证据越多。如果允许这种网络运行,它最终将进入最佳状态,其中尽可能多的约束条件得到满足,并优先考虑最强的约束条件。 (实际上,系统会找到约束满足问题的局部最佳解。全局最优解更难找到。)这种系统进入这种状态的过程称为松弛。我们称系统松弛到解。因此,许多联结主义模型都是约束满足模型,通过松弛过程找到局部最优解。
Thus, for such a network, if feature B is expected to be present whenever feature A is present, there should be a positive connection from the unit corresponding to the hypothesis that A is present to the unit representing the hypothesis that B is present. Contrariwise, if there is a constraint that whenever A is present B is expected not to be present, there should be a negative connection from A to B. If the constraints are weak, the weights should be small; if the constraints are strong, then the weights should be large. Similarly, the inputs to such networks can also be thought of as constraints. A positive input to a particular unit means that there is evidence from the outside that the relevant feature is present. A negative input means that there is evidence from the outside that the feature is not present. The stronger the input, the greater the evidence. If a network of this kind is allowed to run, it will eventually settle into an optimal state in which as many as possible of the constraints are satisfied, with priority given to the strongest constraints. (Actually, the system will find a locally best solution to the constraint-satisfaction problem. Global optima are more difficult to find.) The procedure whereby such a system settles into such a state is called relaxation. We speak of the system relaxing to a solution. Thus, many connectionist models are constraint-satisfaction models that settle on locally optimal solutions through a process of relaxation.
图 19.2显示了一个简单的 16 单元约束网络的示例。网络中的每个单元都代表有关内克尔立方体线图中顶点的假设。该网络由两个互连的子网络组成,每个子网络分别对应于内克尔立方体的两个全局解释。每个子网络中的每个单元都假定从输入图形(立方体)的区域接收输入,该区域对应于其在网络中的位置。图 19.2中的每个单元都标有三字母序列,表示其顶点是否被假设为前或后(F 或 B)、上或下(U 或 L)以及右或左(R 或 L)。因此,例如,每个子网络的左下角单元都假定从输入图形的左下角顶点接收输入。左侧网络中的单元表示假设它正在从立方体前表面的左下顶点接收输入(因此标记为 FLL),而右侧子网络中的单元表示假设它正在从后表面的左下顶点(BLL)接收输入。由于存在每个顶点只有一个解释的限制,因此这两个单元通过强负连接连接。由于任何给定顶点的解释都受到其邻居解释的限制,因此子网络中的每个单元都与网络中的每个邻居正连接。最后,由于存在每种顶点只能有一个的限制(例如,前平面中只能有一个左下顶点,FLL),因此每个子网络中代表相同标签的单元之间存在强负连接。因此,每个单元有三个正连接的邻居,两个负连接的竞争对手,以及一个来自刺激的正输入。
Figure 19.2 shows an example of a simple 16-unit constraint network. Each unit in the network represents a hypothesis concerning a vertex in a line drawing of a Necker cube. The network consists of two interconnected subnetworks—one corresponding to each of the two global interpretations of the Necker cube. Each unit in each subnetwork is assumed to receive input from the region of the input figure—the cube—corresponding to its location in the network. Each unit in figure 19.2 is labeled with a three-letter sequence indicating whether its vertex is hypothesized to be front or back (F or B), upper or lower (U or L), and right or left (R or L). Thus, for example, the lower-left unit of each subnetwork is assumed to receive input from the lower-left vertex of the input figure. The unit in the left network represents the hypothesis that it is receiving input from a lower-left vertex in the front surface of the cube (and is thus labeled FLL), whereas the one in the right subnetwork represents the hypothesis that it is receiving input from a lower-left vertex in the back surface (BLL). Because there is a constraint that each vertex has a single interpretation, these two units are connected by a strong negative connection. Because the interpretation of any given vertex is constrained by the interpretations of its neighbors, each unit in a subnetwork is connected positively with each of its neighbors within the network. Finally, since there is a constraint that there can be only one vertex of each kind (for example, there can be only one lower-left vertex in the front plane, FLL), there is a strong negative connection between units representing the same label in each subnetwork. Thus each unit has three neighbors connected positively, two competitors connected negatively, and one positive input from the stimulus.
图 19.2
一个简单的网络,表示感知 Necker 立方体所涉及的一些约束。椭圆形是网络中的单元;带箭头的连接为正(兴奋性),而带圆圈的连接为负(抑制性);虚线表示来自感知立方体的输入刺激。
Figure 19.2
A simple network representing some constraints involved in perceiving a Necker cube. The ovals are the units in the network; connections with arrow-heads are positive (excitatory), while those with circle-heads are negative (inhibitory); the dotted lines represent input stimuli from the perceived cube.
为了便于说明,我们假设连接的强度已经安排好了,使得两个负输入恰好平衡三个正输入。此外,假设每个单元都从模糊刺激模式接收一个兴奋输入,并且每个兴奋影响都相对较小。因此,如果一个单元的所有三个邻居都处于开启状态,并且它的两个竞争对手都处于开启状态,则这些影响将完全相互抵消;如果外部输入很小,则该单元将倾向于开启。另一方面,如果少于三个邻居处于开启状态,并且它的两个竞争对手都处于开启状态,则该单元将倾向于关闭,即使有来自刺激模式的兴奋输入。
For purposes of this example, we assume that the strengths of the connections have been arranged so that two negative inputs exactly balance three positive inputs. Further it is assumed that each unit receives an excitatory input from the ambiguous stimulus pattern and that each of these excitatory influences is relatively small. Thus, if all three of a unit’s neighbors are on and both of its competitors are on, these effects would entirely cancel out one another; and if there were a small input from the outside, the unit would have a tendency to come on. On the other hand, if fewer than three of its neighbors were on and both of its competitors were on, the unit would have a tendency to turn off, even with an excitatory input from the stimulus pattern.
在前面的段落中,我重点关注了网络的各个单元。然而,关注网络的整个状态而不是单元本身往往更有用。对于二进制(开-关或 0-1)单元,这种规模的网络可能存在总共 2 16种可能的状态,因为原则上,这 16 个单元中的每一个都可以具有 0 或 1 的值。对于连续单元,每个单元可以取 0 到 1 之间的任何值,原则上系统可以取无限多个状态中的任何一个。然而,由于网络内置的限制,系统实际上只能稳定处于相对较少的状态。
In the preceding paragraphs, I focused on the individual units of the networks. It is often useful, however, to focus not on the units but on entire states of the network. In the case of binary (on-off or 0-1) units, there would be a total of 216 possible states in which a network of this size could reside—since, in principle, each of the 16 units could have either value 0 or 1. In the case of continuous units, in which each unit can take on any value between 0 and 1, the system could in principle take on any of an infinite number of states. Yet because of the constraints built into the network, there are only a relatively few of those states into which the system will ever actually settle.
要了解这一点,请考虑单元以异步方式逐个更新的情况。在每个时间片期间,选择一个单元进行更新。如果其净输入超过 0,则其值将向 l 方向推;否则,其值将向 0 方向推。假设系统从所有单元关闭开始。然后随机选择一个单元进行更新。由于它从刺激中接收到轻微的正输入,而没有其他输入,因此它将被赋予正激活值。然后选择另一个单元进行更新。除非它与第一个单元直接竞争,否则它也将被打开。最终,相邻单元的联盟将被打开。这些单元将倾向于打开同一子网络中的更多邻居,并关闭另一个子网络中的竞争对手。系统将(几乎总是)最终处于一个子网络中的所有单元都被完全激活而另一个子网络中的任何单元都没有被激活的情况。也就是说,系统最终将把 Necker 立方体解释为面向左或面向右。当系统进入某种状态并保持这种状态时,这种状态就称为稳定状态或网络的固定点。单元之间连接模式中隐含的约束决定了系统可能的稳定状态集,因此也决定了输入的可能解释集。
To see this, consider the case in which the units are updated asynchronously, one at a time. During each time slice, one of the units is chosen to update. If its net input exceeds 0, its value will be pushed toward l; otherwise its value will be pushed toward 0. Imagine that the system starts with all units off. A unit is then chosen at random to be updated. Because it is receiving slight positive input from the stimulus and no other inputs, it will be given a positive activation value. Then another unit is chosen to update. Unless it is in direct competition with the first unit, it too will be turned on. Eventually a coalition of neighboring units will be turned on. These units will tend to turn on more of their neighbors in the same subnetwork and turn off their competitors in the other subnetwork. The system will (almost always) end up in a situation in which all of the units in one subnetwork are fully activated and none of the units in the other subnetwork is activated. That is, the system will end up interpreting the Necker cube as either facing left or facing right. Whenever the system gets into a state and stays there, the state is called a stable state or a fixed point of the network. The constraints implicit in the pattern of connections among the units determine the set of possible stable states of the system and therefore the set of possible interpretations of the inputs.
Hopfield (1982) 已经证明,可以对此类系统(具有对称权重和异步更新)的行为进行一般性描述。具体而言,他已经证明,此类系统可以概念化为通过梯度下降法最小化全局度量(他称之为系统能量),或者等效地,通过爬山法最大化满足的约束。更具体地说,系统运行的方式是始终从满足较少约束的状态转变为满足较多约束的状态,其中约束满足度度量由下式给出
Hopfield (1982) has shown that it is possible to give a general account of the behavior of systems like this (with symmetric weights and asynchronous updates). In particular, he has shown that such systems can be conceptualized as minimizing a global measure, which he calls the energy of the system, through a method of gradient descent or, equivalently, maximizing the constraints satisfied through a method of hill climbing. More specifically, the system operates so as to move always from a state that satisfies fewer constraints to one that satisfies more, where the measure of constraint satisfaction is given by
本质上,该方程表明,整体拟合优度由每对单元对拟合优度的贡献程度加上单元满足输入约束的程度之和给出。一对单元的贡献由它们的激活值和连接它们的权重的乘积给出。因此,如果权重为正,则每个单元都希望尽可能活跃 - 也就是说,这两个单元的激活值应该被推向 1。如果权重为负,则至少一个单元应该为 0 以最大化成对优度。同样,如果给定单元的输入约束为正,则通过使该单元的激活值趋向于最大值来最大化其对总拟合优度的贡献。如果它是负的,激活值应该减小到 0。当然,约束通常不会完全一致。有时可能必须打开给定单元以在某些方面增加功能,而在另一些方面减少功能。关键是系统寻求最大化的是所有这些单独贡献的总和。因此,对于系统的每个状态(即单元上每个可能的激活模式),输入模式和连接矩阵 W 确定了拟合优度函数的值。系统通过从状态向上移动到相邻状态来处理其输入,直到达到最大优度状态。当它达到这样的稳定状态或固定点时,它将保持在该状态,并且可以说它已经“确定”了约束满足问题的解决方案,或者像我们当前的情况一样,“确定了对输入的解释”。
Essentially the equation says that the overall goodness of fit is given by the sum of the degrees to which each pair of units contributes to the goodness plus the degree to which the units satisfy the input constraints. The contribution of a pair of units is given by the product of their activation values and the weight connecting them. Thus, if the weight is positive, each unit wants to be as active as possible—that is, the activation values for those two units should be pushed toward 1. If the weight is negative, then at least one of the units should be 0 to maximize the pairwise goodness. Similarly if the input constraint for a given unit is positive, then its contribution to the total goodness of fit is maximized by bringing the activation of that unit toward its maximal value. If it is negative, the activation value should be decreased toward 0. Of course the constraints will generally not be totally consistent. Sometimes a given unit may have to be turned on to increase the function in some ways yet decrease it in other ways. The point is that it is the sum of all of these individual contributions that the system seeks to maximize. Thus, for every state of the system—every possible pattern of activation over the units—the pattern of inputs and the connectivity matrix W determine a value of the goodness-of-fit function. The system processes its input by moving upward from state to adjacent state until it reaches a state of maximum goodness. When it reaches such a stable state or fixed point, it will stay in that state and it can be said to have “settled” on a solution to the constraint-satisfaction problem or, as in our present case, “settled into an interpretation” of the input.
因此,重要的是要看到,完全局部的计算操作(其中每个单元根据其净输入向上或向下调整其激活)有助于使网络收敛到最大化全局优度或约束满足程度的状态。霍普菲尔德对当前分析的主要贡献是指出了有关具有对称连接和异步更新激活的网络行为的这一基本事实。
It is important to see then that entirely local computational operations, in which each unit adjusts its activation up or down on the basis of its net input, serve to allow the network to converge toward states that maximize a global measure of goodness or degree of constraint satisfaction. Hopfield’s main contribution to the present analysis was to point out this basic fact about the behavior of networks with symmetrical connections and asynchronous update of activations.
最后,认知科学中最困难的问题之一是建立能够让大量知识源在解决问题时进行有效交互的系统。因此,在语言处理中,我们希望句法、音系、语义和语用知识源都能在构建输入的含义时进行交互。Reddy 等人 (1973) 在语音感知方面取得了一些成功,他们使用传闻系统是因为它们在高度结构化的语言领域工作。结构化程度较低的领域已被证明很难组织。联结主义模型被认为是约束满足网络,非常适合融合多种知识来源。每种知识类型只是另一种约束,系统将并行地找到最能满足所有知识来源的所有约束的值配置。表示的一致性和交互的共同货币(激活值)使联结主义系统在这个领域特别强大。
Finally, one of the most difficult problems in cognitive science is to build systems that can allow a large number of knowledge sources to interact usefully in the solution of a problem. Thus, in language processing we would want syntactic, phonological, semantic, and pragmatic knowledge sources all to interact in the construction of the meaning of an input. Reddy et al. (1973) have had some success in the case of speech perception with the Hearsay system because they were working in the highly structured domain of language. Less structured domains have proved very difficult to organize. Connectionist models, conceived as constraint-satisfaction networks, are ideally suited for blending multiple-knowledge sources. Each knowledge type is simply another constraint, and the system will, in parallel, find those configurations of values that best satisfy all of the constraints from all of the knowledge sources. The uniformity of representation and the common currency of interaction (activation values) make connectionist systems especially powerful for this domain.
总而言之,联结主义模型中有一大部分可以视为约束满足模型。这些网络可以描述为通过爬升到网络中隐含约束的最大满足状态来进行信息处理。这种观察网络的方式的一个非常有用的结果是,我们不仅可以根据单个单元的行为来描述它们的行为,还可以根据网络本身的属性来描述它们的行为。理解这些网络属性的一个主要概念是系统移动时的拟合优度景观。一旦我们正确地描述了这个景观,我们就描述了系统的操作属性——它将通过向上移动到优度最大值来处理信息。系统将找到的特定最大值取决于系统的起始位置以及输入引起的空间扭曲。优度景观的一个非常重要的描述符是系统可以找到的最大值集、输入每个最大值的区域的大小以及最大值本身的高度。状态本身对应于可能的解释,空间中的峰值对应于最佳解释,特定峰值周围的山麓或裙边的范围决定了找到峰值的可能性,峰值的高度对应于网络约束的实际满足程度,或者对应于与相应状态相关的解释的优劣。
To summarize, there is a large subset of connectionist models that can be considered constraint-satisfaction models. These networks can be described as carrying out their information processing by climbing into states of maximal satisfaction of the constraints implicit in the network. A very useful consequence of this way of viewing networks is that we can describe their behavior not only in terms of the behavior of individual units but also in terms of the properties of the network itself A primary concept for understanding these network properties is the goodness-of-fit landscape over which the system moves. Once we have correctly described this landscape, we have described the operational properties of the system—it will process information by moving uphill toward goodness maxima. The particular maximum that the system will find is determined by where the system starts and by the distortions of the space induced by the input. One of the very important descriptors of a goodness landscape is the set of maxima that a system can find, the size of the region that feeds into each maximum, and the height of the maximum itself. The states themselves correspond to possible interpretations, the peaks in the space correspond to the best interpretations, the extent of the foothills or skirts surrounding a particular peak determines the likelihood of finding the peak, and the height of the peak corresponds to the degree to which the constraints of the network are actually met or alternatively to the goodness of the interpretation associated with the corresponding state.
最佳匹配搜索、模式识别和内容可寻址存储器。这些都是一般最佳匹配问题的变体(比较 Minsky 和 Papert,1969 年)。最佳匹配问题对于串行计算算法来说尤其困难(它们涉及穷举搜索),但正如我们刚刚指出的那样,联结系统可以很容易地用于找到与一组约束最匹配的解释。
BEST-MATCH SEARCH, PATTERN RECOGNITION, AND CONTENT-ADDRESSABLE MEMORY. These are all variants on the general best-match problem (compare Minsky and Papert, 1969). Best-match problems are especially difficult for serial computational algorithms (they involve exhaustive search), but, as we have just indicated, connectionist systems can readily be used to find the interpretation that best matches a set of constraints.
它们同样可用于查找与某些目标或探测最匹配的存储数据。在这种情况下,可以想象网络由两类单元组成。一类单元(可见单元)对应于网络中存储的内容,即每个存储模式都是这些单元的可能激活模式。其他单元(隐藏单元)对应于存储模式的共享结构属性,这些属性在存储和检索模式中起着重要作用。模式本身实际上存储在所有这些单元之间连接的权重中。如果我们将每个存储模式视为特征的集合,则每个可见单元对应于相关模式中存在某个特定特征的假设,每个隐藏单元对应于有关多个特征配置的假设。特定隐藏单元对应的假设由用于存储输入的确切学习规则和存储模式集合的特征决定。在这样的网络中检索相当于设置一些可见单元(检索探针)的值,并让系统适应对该输入的最佳解释,同时系统本身设置其余可见单元的值。这是一种模式完成。这里的细节不太重要,因为各种学习规则导致网络都具有以下重要属性:
They can similarly be used to find stored data that best match some target or probe. In this case, it is useful to imagine that the network consists of two classes of units. One class, the visible units, corresponds to the contents stored in the network, in the sense that each stored pattern is a possible pattern of activation of these units. The other units, the hidden units, correspond to shared structural properties of the stored patterns that play a role in storing and retrieving them. The patterns themselves are actually stored in the weights on the connections among all these units. If we think of each stored pattern as a collection of features, then each visible unit corresponds to the hypothesis that some particular feature is present in the relevant pattern, and each hidden unit corresponds to a hypothesis concerning a configuration of several features. The hypothesis to which a particular hidden unit corresponds is determined by the exact learning rule used to store the input and by the characteristics of the ensemble of stored patterns. Retrieval in such a network amounts to setting the values of some of the visible units (the retrieval probe) and letting the system settle to the best interpretation of that input, while itself setting the values of the remaining visible units. This is a kind of pattern completion. The details are not too important here because a variety of learning rules lead to networks that all have the following important properties:
这些属性与人类记忆的特点非常接近,我相信,这正是我们在任何记忆理论中想要的那种属性。
These properties correspond very closely to the characteristics of human memory and, I believe, are exactly the kind of properties we want in any theory of memory.
基于相似性的自动泛化。对人工智能程序的主要抱怨之一是其“脆弱性”。这些程序通常非常擅长于它们被编程要做的事情,但当面对新情况时,它们会以不智能或奇怪的方式做出反应。这种脆弱性似乎至少有两个原因。在传统的符号处理系统中,相似性只是间接表示的,因此不能作为泛化的基础;而且大多数人工智能程序都不是自我修改的,不能适应它们的环境。
AUTOMATIC, SIMILARITY-BASED GENERALIZATION. One of the major complaints against AI programs is their “fragility”. The programs are usually very good at what they are programmed to do, but respond in unintelligent or odd ways when faced with novel situations. There seem to be at least two reasons for this fragility. In conventional symbol-processing systems similarity is represented only indirectly, and is therefore not available as a basis for generalizations; and most AI programs are not self-modifying and cannot adapt to their environments.
另一方面,在我们的联结系统中,模式之间的相似性与模式本身一起直接表示在连接权重中——这样相似的模式就有相似的效果。因此,基于相似性的泛化是联结模型的自动属性。应该注意的是,模式之间的相似度大致由表示模式的向量的内积给出。因此,泛化的维度由表示空间的维度给出。通常这会导致正确的概括。但是,在某些情况下,它会导致不适当的概括。在这种情况下,我们必须让系统学习其适当的表示。在下一节中,我将描述如何学习适当的表示,以便自动进行正确的概括。
In our connectionist systems, on the other hand, similarities among patterns are directly represented along with the patterns themselves in the connection weights—in such a way that similar patterns have similar effects. Therefore, similarity-based generalization is an automatic property of connectionist models. It should be noted that the degree of similarity between patterns is roughly given by the inner product of the vectors representing the patterns. Thus the dimensions of generalization are given by the dimensions of the representational space. Often this will lead to the right generalizations. But, there are situations in which it leads to inappropriate generalizations. In such cases, we must allow the system to learn its appropriate representation. In the next section I describe how the appropriate representation can be learned so that the correct generalizations are automatically made.
学习。联结系统的一个关键优势是可以定义简单但功能强大的学习程序,使系统能够适应其环境。神经启发模型的学习方面的研究首先引起了人们对它们的兴趣(比较Rosenblatt,1962),而那些学习程序不适用于复杂网络的证明导致了人们兴趣的丧失(比较Minsky和Papert,1969)。虽然感知器收敛程序及其变体已经存在了一段时间,但它们仅限于仅涉及输入和输出单元的简单两层网络。在这些情况下没有隐藏单元,也没有内部表示。外部世界提供的编码就足够了。尽管如此,这种网络已被证明在各种应用中都很有用。也许它们最重要的特性是它们将相似的输入模式映射到相似的输出模式。这使它们能够做出合理的概括并合理地执行从未出现过的模式。联结系统中模式的相似性由它们的重叠决定。对于两层网络来说,这种重叠完全是由学习系统本身之外的东西决定的——由产生模式的任何东西决定。
LEARNING. A key advantage of connectionist systems is the fact that simple yet powerful learning procedures can be defined that allow the systems to adapt to their environments. It was work on the learning aspect of neurally inspired models that first led to an interest in them (compare Rosenblatt, 1962), and it was the demonstration that those learning procedures could not work for complex networks that contributed to the loss of interest (compare Minsky and Papert, 1969). Although the perceptron convergence procedure and its variants have been around for some time, they are limited to simple two-layer networks involving only input and output units. There were no hidden units in these cases and no internal representation. The coding provided by the external world had to suffice. Nevertheless, such networks have proved useful in a wide variety of applications. Perhaps their most important characteristic is that they map similar input patterns to similar output patterns. This is what allows them to make reasonable generalizations and perform reasonably on patterns that have never before been presented. The similarity of patterns in connectionist systems is determined by their overlap. This overlap, for two-layer networks, is determined entirely outside the learning system itself—by whatever produces the patterns.
相似的输入模式会导致相似的输出,这一约束可能导致系统无法学习从输入到输出的某些映射。每当外界提供的表示使得输入和输出模式的相似结构非常不同时,没有内部表示的网络(即没有隐藏单元的网络)将无法执行必要的映射。这种情况的一个典型例子是表 19.1中的异或 (XOR) 问题。这里我们看到重叠最少的模式应该生成相同的输出值。这个问题以及许多类似问题无法通过缺少隐藏单元来创建输入模式的内部表示的网络来解决。值得注意的是,如果输入模式包含第三个输入位,当且仅当其他两个输入位都为 1 时才取值 1(如表 19.2所示),那么双层系统将能够解决这个问题。
The constraint that similar input patterns lead to similar outputs can lead to an inability of the system to learn certain mappings from input to output. Whenever the representation provided by the outside world is such that the similarity structure of the input and output patterns is very different, a network without internal representations (that is, a network without hidden units) will be unable to perform the necessary mappings. A classic example of this case is the exclusive-or (XOR) problem illustrated in table 19.1. Here we see that those patterns that overlap least are supposed to generate identical output values. This problem and many others like it cannot be solved by networks that lack hidden units with which to create their own internal representations of the input patterns. It is interesting to note that if the input patterns contained a third input bit, taking the value 1 when and only when the other two were both 1 (as shown in table 19.2), a two-layer system would be able to solve the problem.
输入模式 Input patterns |
输出模式 Output patterns |
|
|---|---|---|
00 00 |
→ → |
0 0 |
01 01 |
→ → |
1 1 |
10 10 |
→ → |
1 1 |
11 11 |
→ → |
0 0 |
表 19.1
XOR 问题。
Table 19.1
XOR problem.
输入模式 Input patterns |
输出模式 Output patterns |
|
|---|---|---|
000 000 |
→ → |
0 0 |
010 010 |
→ → |
1 1 |
100 100 |
→ → |
1 1 |
111 111 |
→ → |
0 0 |
表 19.2
具有冗余第三位的异或问题。
Table 19.2
XOR problem with redundant third bit.
Minsky 和 Papert (1969) 仔细分析了此类系统能够执行所需映射的条件。他们指出,在许多有趣的情况下,此类网络无法解决问题。另一方面,正如 Minsky 和 Papert 也指出的那样,如果有一层简单的类似感知器的隐藏单元(如图19.3所示),可以用它来增强原始输入模式,则隐藏单元中总会有输入模式的重新编码(即内部表示),其中隐藏单元之间的模式相似性可以支持从输入到输出单元的任何所需映射。因此,如果我们有从输入单元到足够大的隐藏单元集的正确连接,我们总能找到一个表示,它可以通过这些隐藏单元执行从输入到输出的任何映射。在 XOR 问题的情况下,添加一个检测输入单元连接的特征会充分改变模式的相似性结构,从而允许学习解决方案。
Minsky and Papert (1969) have provided a careful analysis of conditions under which such systems are capable of carrying out the required mappings. They show that in many interesting cases networks of this kind are incapable of solving the problems. On the other hand, as Minsky and Papert also point out, if there is a layer of simple perceptron-like hidden units, as shown in figure 19.3, with which the original input pattern can be augmented, there is always a recoding (that is, an internal representation) of the input patterns in the hidden units in which the similarity of the patterns among the hidden units can support any required mapping from the input to the output units. Thus if we have the right connections from the input units to a large enough set of hidden units, we can always find a representation that will perform any mapping from input to output through these hidden units. In the case of the XOR problem, the addition of a feature that detects the conjunction of the input units changes the similarity structure of the patterns sufficiently to allow the solution to be learned.
图 19.3
输入模式由内部表示单元重新编码的多层网络。
Figure 19.3
A multilayer network in which input patterns are recoded by internal representation units.
如图19.4所示,这可以用单个隐藏单元来实现。箭头上的数字表示单元之间的连接强度。圆圈内的数字表示单元的阈值。隐藏单元的阈值为 +1.5,确保只有当两个输入单元都打开时,它才会打开。输出单元的阈值 0.5 确保它仅在收到大于 0.5 的净正输入时才会打开。隐藏单元到输出单元的权重 -2 确保当两个输入单元都打开时,输出单元不会打开。请注意,从输出单元的角度来看,隐藏单元被视为另一个输入单元。就好像输入模式由三个而不是两个单元组成(基本上如表 19.2所示)。
As illustrated in figure 19.4, this can be done with a single hidden unit. The numbers on the arrows represent the strengths of the connections among the units. The numbers written in the circles represent the thresholds of the units. The value of +1.5 for the threshold of the hidden unit ensures that it will be turned on only when both input units are on. The threshold of 0.5 for the output unit ensures that it will turn on only when it receives a net positive input greater than 0.5. The weight of -2 from the hidden unit to the output unit ensures that the output unit will not come on when both input units are on. Note that, from the point of view of the output unit, the hidden unit is treated as simply another input unit. It is as if the input patterns consisted of three rather than two units (essentially as in table 19.2).
图 19.4
具有一个隐藏单元的简单 XOR 网络。
Figure 19.4
A simple XOR network with one hidden unit.
此类网络的存在表明了隐藏单元和内部表示的潜在威力。正如明斯基和帕普特所指出的,问题在于,尽管对于所有无需隐藏单元即可解决的问题,都有一个非常简单的保证学习规则——即感知器收敛程序(或 Widrow 和 Hoff 于 1960 年最初报告的变体),但对于多层网络的学习,却没有同样强大的规则。
The existence of networks such as this illustrates the potential power of hidden units and internal representations. The problem, as noted by Minsky and Papert, is that, whereas there is a very simple guaranteed learning rule for all problems that can be solved without hidden units—namely, the perceptron convergence procedure (or the variation reported originally by Widrow and Hoff, 1960)—there has been no equally powerful rule for learning in multilayer networks.
显然,如果我们希望将这些联结网络用于一般计算目的,我们必须有一个能够学习其自身内部表示的学习方案。这正是我们 (Rumelhart et al., 1986a) 所做的。我们开发了一种感知器学习过程的泛化,称为广义 delta 规则,它允许系统学习计算任意函数。没有自我修改内部表示的网络所固有的约束不再适用。基本学习过程是一个两阶段过程。首先,将输入应用于网络。然后,在系统处理一段时间后,网络的某些单元(通常是输出单元)会被告知它们应该达到的值。如果它们已经达到了期望值,则其输入连接的权重保持不变。如果它们与目标值不同,则略微改变这些权重,以减少实际达到的值与目标值之间的差异。
It is clear that if we hope to use these connectionist networks for general computational purposes, we must have a learning scheme capable of learning its own internal representations. This is just what we (Rumelhart et al., 1986a) have done. We have developed a generalization of the perceptron learning procedure, called the generalized delta rule, which allows the system to learn to compute arbitrary functions. The constraints inherent in networks without self-modifying internal representations are no longer applicable. The basic learning procedure is a two-stage process. First, an input is applied to the network. Then, after the system has processed for some time, certain units of the network—usually the output units—are informed of the values they ought to have attained. If they have attained the desired values, the weights on their input connections are left unchanged. If they differ from their target values, then those weights are changed slightly, in such a way as to reduce the differences between the actual values attained and the target values.
输出单元的实际值与目标值之间的差异可视为误差信号。类似的误差信号必须依次发送回影响输出单元的单元。每个这样的单元都会收到一个误差信号,该信号等于它所连接的每个输出单元的误差之和乘以到该输出单元的连接权重。然后,基于这些误差信号,可以修改进入这些“第二层”单元的输入连接的权重,之后误差信号可以传回另一层。这个过程称为误差的反向传播,一直持续到误差信号到达输入单元或被传回预定次数为止。然后呈现新的输入模式并重复该过程。尽管这个过程听起来可能很难,但实际上在这些网络中实现起来非常简单和容易。如 Rumelhart 等人 (1986a) 所示,这样的过程将始终以某种方式改变其权重,以减少实际输出值与期望输出值之间的总体差异。此外,可以证明该系统适用于任何网络。
Those differences between the actual and target values at the output units can be thought of as error signals. Similar error signals must be sent back in turn to those units that impinged on the output units. Each such unit receives an error signal that is equal to the sum of the errors in each of the output units to which it connects times the weight on the connection to that output unit. Then, based on those error signals, the weights on the input connections into those “second-layer” units can be modified, after which error signals can be passed back another layer. This process—called the backpropagation of error—continues until the error signals reach the input units or until they have been passed back a predetermined number of times. Then a new input pattern is presented and the process repeats. Although the procedure may sound difficult, it is actually quite simple and easy to implement within these nets. As shown in Rumelhart et al. (1986a), such a procedure will always change its weights in such a way as to reduce the overall difference between the actual output values and the desired output values. Moreover it can be shown that this system will work for any network whatsoever.
明斯基和帕普特在对感知器的悲观讨论中讨论了多层机器。他们指出
Minsky and Papert, in their pessimistic discussion of perceptrons, discuss multilayer machines. They state that
尽管感知器存在严重的局限性(甚至正是因为存在这些局限性!),但它仍然值得研究。它有许多引人注目的特征:线性;有趣的学习定理;作为一种并行计算,它具有明显的典型简单性。没有理由认为这些优点会延续到多层版本。尽管如此,我们认为阐明(或拒绝)我们的直觉判断(即扩展是无效的)是一个重要的研究问题。也许会发现一些强大的收敛定理,或者会发现未能为多层机器产生有趣的“学习定理”的深刻原因。(1969,231-232)
The perceptron has shown itself worthy of study despite (and even because of!) its severe limitations. It has many features chat attract attention: its linearity; its intriguing learning theorem; its clear paradigmatic simplicity as a kind of parallel computation. There is no reason to suppose that any of these virtues carry over to the many-layered version. Nevertheless, we consider it to be an important research problem to elucidate (or reject) our intuitive judgment that the extension is sterile. Perhaps some powerful convergence theorem will be discovered, or some profound reason for the failure to produce an interesting “learning theorem” for the multilayered machine will be found. (1969, 231–232)
虽然我们的学习结果不能保证我们能找到所有可解问题的解,但我们的分析和模拟结果表明,实际上,这种误差传播方案几乎在每种情况下都能得到解。简而言之,我相信我们已经回答了明斯基和帕普特的挑战,并找到了一个足够有力的学习结果,足以证明他们对多层机器学习的悲观态度是错误的。
Although our learning results do not guarantee that we can find a solution for all solvable problems, our analysis and simulation results have shown that, as a practical matter, this error-propagation scheme leads to solutions in virtually every case. In short, I believe that we have answered Minsky and Papert’s challenge and have found a learning result sufficiently powerful to demonstrate that their pessimism about learning in multilayer machines was misplaced.
可以将我所描述的过程视为一台并行计算机,在看到指定某个函数的适当输入/输出示例后,该计算机会自行编程以计算该函数。众所周知,并行计算机的编程难度很大。在这里,我们有一个机制,我们实际上不必知道如何编写程序即可让系统执行此操作。
One way to view the procedure I have been describing is as a parallel computer that, having been shown the appropriate input/output exemplars specifying some function, programs itself to compute that function in general. Parallel computers are notoriously difficult to program. Here we have a mechanism whereby we do not actually have to know how to write the program to get the system to do it.
优雅降级。最后,联结主义模型是认知科学模型的有趣候选者,因为它们在受到损坏和信息过载时具有优雅降级的特性。我们的网络具有学习能力,这使得计算机有望真正学会绕过有故障的组件:因为每个单元都参与存储许多模式,而每个模式都涉及许多不同的单元,所以丢失一些组件会降低存储的信息的质量,但不会破坏它。同样,这种记忆不应被认为具有一定的固定容量。相反,随着记忆过载,存储干扰和类似信息的混合会越来越多。这种优雅降级的特性在许多方面模仿了人类的反应,也是我们认为这些人类信息处理模型可信的原因之一。
GRACEFUL DEGRADATION. Finally, connectionist models are interesting candidates for cognitive-science models because of their property of graceful degradation in the face of damage and information overload. The ability of our networks to learn leads to the promise of computers that can literally learn their way around faulty components: because every unit participates in the storage of many patterns and because each pattern involves many different units, the loss of a few components will degrade the stored information, but will not destroy it. Similarly such memories should not be conceived as having a certain fixed capacity. Rather, there is simply more and more storage interference and blending of similar pieces of information as the memory is overloaded. This property of graceful degradation mimics the human response in many ways and is one of the reasons we find these models of human information processing plausible.
近年来,联结主义领域的研究工作出现了爆炸式增长。这项工作是跨学科的,由心理学家、物理学家、计算机科学家、工程师、神经科学家和其他认知科学家开展。每年都会举办许多国内和国际会议。在这样的环境下,很难跟上这个快速发展的领域的步伐。然而,阅读最近的论文可以发现这项活动的几个核心主题。这些主题包括学习和泛化研究(尤其是反向传播学习过程的使用)、神经科学应用、网络的数学特性(学习算法和联结主义式计算本身与更传统的计算范式的比较)以及最后为联结主义计算设备的物理实现开发实施基础,特别是在光学和模拟 VLSI 领域。
Recent years have seen a virtual explosion of work in the connectionist area. This work has been singularly interdisciplinary, being carried out by psychologists, physicists, computer scientists, engineers, neuroscientists, and other cognitive scientists. A number of national and international conferences have been established and are being held each year. In such an environment it is difficult to keep up with the rapidly developing field. Nevertheless, a reading of recent papers indicates a few central themes in this activity. These themes include the study of learning and generalization (especially the use of the backpropagation learning procedure), applications to neuroscience, mathematical properties of networks—both of the learning algorithms and of connectionist style computation itself in comparison to more conventional computational paradigms—and finally the development of an implementational base for physical realizations of connectionist computational devices, especially in the areas of optics and analog VLSI.
虽然还有许多其他有趣且重要的发展,但最后,我将简要总结一下过去几年我参与最多的工作,即多层网络内的学习和泛化研究。即使这个总结也必然是有选择性的,但它应该可以对该领域当前的许多工作进行抽样。
Although there are many other interesting and important developments, I conclude with a brief summary of the work with which I have been most involved over the past several years, namely, the study of learning and generalization within multilayer networks. Even this summary is necessarily selective, but it should give a sampling of much of the current work in the area.
反向传播学习程序可能已成为训练网络最流行的方法。该程序已用于训练网络解决各种问题,包括字符识别、语音识别、声纳检测、从拼写到声音的映射、运动控制、分子结构分析、眼部疾病诊断、混沌函数预测、西洋双陆棋、简单句子解析以及许多其他领域。这些示例的主要意义可能在于反向传播学习程序可以有效应用于大量问题。尽管主题范围相当广泛,并且其中一些应用取得了成功,但仍有许多严重的未决问题。主要关注的理论问题分为三个主要领域。(1)架构问题:除了标准的三层网络之外,是否存在适用于某些应用领域的有用架构?(2)扩展问题:我们如何才能减少似乎涉及更困难和更有趣的问题的大量训练时间? (3)泛化问题:我们如何确定在示例集子集上训练的网络能够正确地推广到整个示例集?
The backpropagation learning procedure has become possibly the single most popular method for training networks. The procedure has been used to train networks on problem domains including character recognition, speech recognition, sonar detection, mapping from spelling to sound, motor control, analysis of molecular structure, diagnosis of eye diseases, prediction of chaotic functions, playing backgammon, the parsing of simple sentences, and many, many more areas. Perhaps the major point of these examples is the enormous range of problems to which the backpropagation learning procedure can usefully be applied. In spite of the rather impressive breadth of topics and the success of some of these applications, there are a number of serious open problems. The theoretical issues of primary concern fall into three main areas. (1) The architecture problem: are there useful architectures beyond the standard three-layer network that are appropriate for certain areas of application? (2) The scaling problem: how can we cut down on the substantial training time that seems to be involved for more difficult and interesting problems? (3) The generalization problem: how can we be certain that a network trained on a subset of the example set will generalize correctly to the entire set of examples?
尽管大多数应用都涉及简单的三层反向传播网络,该网络由一个输入层、一个隐藏层和一个输出层单元组成,但已经提出了大量有趣的架构,每个架构都用于解决某些特定的问题。例如,已经提出了许多“特殊”架构来对诸如运动控制之类的顺序现象进行建模。其中最重要的可能是 Mike Jordan (1986a) 提出的用于生成音素序列的架构。该网络的基本结构如图19.5所示。它由四组单元组成。规划单元告诉网络它正在生成哪个序列,它们在序列开始时固定不变。上下文单元跟踪系统在序列中的位置,从系统的输出单元和它们自己接收输入,构成迄今为止生成的序列的内存。隐藏单元将来自规划单元的信息与来自上下文单元的信息相结合,以确定下一步要生成哪个输出。输出单元产生所需的输出值。这种基本结构及其多种变体已成功用于生成音素序列(Jordan,1986a)、动作序列(Jordan,1988)、旋律中音符的序列(Todd,1989)、模拟船舶的转弯序列(Miyata,1987)以及许多其他应用。Elman(1988)使用了一种类似的序列识别网络,每次处理一个句子;Mozer(1988)开发并研究了另一种变体。Elman 使用的结构如图 19.6所示。该网络也涉及四组单元:输入单元,其中以每次一个元素的形式呈现要识别的序列;上下文单元,从隐藏单元接收输入并向其发送输出,从而构成对近期事件的记忆;隐藏单元,将当前输入与过去输入的记忆相结合,以命名序列、预测序列的下一个元素或两者兼而有之;当然还有输出单元。
Although most applications have involved the simple three-layer backpropagation network with one input layer, one hidden layer, and one output layer of units, there have been a large number of interesting architectures proposed—each for the solution of some particular problem of interest. There are, for example, a number of “special” architectures that have been proposed for the modeling of such sequential phenomena as motor control. Perhaps the most important of these is the one proposed by Mike Jordan (1986a) for producing sequences of phonemes. The basic structure of the network is illustrated in figure 19.5. It consists of four groups of units. Plan units, which tell the network which sequence it is producing, are fixed at the start of a sequence and are not changed. Context units, which keep track of where the system is in the sequence, receive input from the output units of the systems and from themselves, constituting a memory for the sequence produced thus far. Hidden units combine the information from the plan units with that from the context units to determine which output is to be produced next. Output units produce the desired output values. This basic structure, with numerous variations, has been used successfully in producing sequences of phonemes (Jordan, 1986a), sequences of movements (Jordan, 1988), sequences of notes in a melody (Todd, 1989), sequences of turns in a simulated ship (Miyata, 1987), and for many other applications. An analogous network for recognizing sequences has been used by Elman (1988) for processing sentences one at a time; and another variation has been developed and studied by Mozer (1988). The architecture used by Elman is illustrated in figure 19.6. This network also involves four sets of units: input units, in which the sequence to be recognized is presented one element at a time; context units, which receive inputs from and send outputs to the hidden units and thus constitute a memory for recent events; hidden units, which combine the current input with the memory of past inputs either to name the sequence, to predict the next element of the sequence, or both; and, of course, output units.
图 19.5 Jordan (1986a) 开发的用于学习执行
顺序操作的循环网络。
Figure 19.5
A recurrent network of the type developed by Jordan (1986a) for learning to perform sequential operations.
图 19.6
Elman(1988)用于学习识别序列的循环网络。
Figure 19.6
A recurrent network of the type employed by Elman (1988) for learning to recognize sequences.
另一种受到关注的架构是由 Hinton 和 Sejnowski (1986) 提出的,并被 Elman 和 Zipser (1987)、Cottrell 等人 (1987) 和许多其他人采用。它已成为反向传播标准工具包的一部分。这就是所谓的自动编码模式集的方法。在这种情况下,基本架构与传统情况一样由三层单元组成;但是,输入层和输出层是相同的。其思想是将输入传递到少数隐藏单元,并在输出单元上重现。这要求隐藏单元对输入模式进行一种非线性的主成分分析。在这种情况下,这相当于一种关键特征的提取。在许多应用中,这些特征可以提供有用的模式紧凑描述。许多其他架构也在探索中。有趣且有用的架构空间很大,探索将持续很多年。
Another kind of architecture that has received some attention was suggested by Hinton and Sejnowski (1986) and has been employed by Elman and Zipser (1987), Cottrell et al. (1987), and many others. It has become part of the standard toolkit of backpropagation. This is the so-called method of autoencoding the pattern set. The basic architecture in this case consists of three layers of units as in the conventional case; however, the input and output layers are identical. The idea is to pass the input through a small number of hidden units and reproduce it over the output units. This requires the hidden units to do a kind of nonlinear, principle-components analysis of the input patterns. In this case, that corresponds to a kind of extraction of critical features. In many applications, these features turn out to provide a useful compact description of the patterns. Many other architectures are being explored as well. The space of interesting and useful architecture is large and the exploration will continue for many years.
尽管扩展问题显然已成为类似反向传播的学习程序的核心问题,但其受到的关注较少。基本发现是,困难的问题需要多次学习尝试。例如,学习中等难度的问题(即,解决问题需要数万到数十万个连接)通常需要数万甚至数十万次模式展示。解决此类问题需要大型快速的计算机,而解决需要超过数十万个连接的问题则不切实际。因此,学习如何加快学习速度以便能够在更合理的次数内学习更困难的问题是一个值得关注的问题。提出的解决方案分为两个基本类别。一种方法是改进学习程序,要么通过动态优化参数(即,在学习过程中系统地改变学习率),要么在权重变化程序中使用更多信息(即,所谓的二阶反向传播程序,其中还计算二阶导数)。尽管这些方法可以取得一些改进,但在某些问题领域,基本的扩展问题仍然存在。似乎基本问题是,无论每个样本的使用效率如何,困难的问题都需要大量的样本。另一种方法源于将学习和进化视为相互连续的过程。根据这种观点,网络需要很长时间才能学习是可以预料的,因为我们通常将它们的行为与具有长期进化历史的生物进行比较。因此,解决方案是在尽可能适合要学习的问题领域的地方启动系统。Shepard(1989)认为,这种方法对于正确理解正在建模的现象至关重要。
The scaling problem has received somewhat less attention, although it has clearly emerged as a central problem with backpropagation-like learning procedures. The basic finding has been that difficult problems require many learning trials. For example, it is not unusual to require tens or even hundreds of thousands of pattern presentations to learn moderately difficult problems—that is, problems whose solutions require tens of thousands to a few hundred thousand connections. Large and fast computers are required for such problems, and it is impractical for problems requiring more than a few hundred thousand connections. It is therefore a matter of concern to learn how to speed up the learning so that it can learn more difficult problems in a more reasonable number of exposures. The proposed solutions fall into two basic categories. One line of attack is to improve the learning procedure, either by optimizing the parameters dynamically (that is, change the learning rate systematically during learning), or by using more information in the weight-changing procedure (that is, the so-called second-order backpropagation procedure, in which the second derivatives are also computed). Although some improvements can be attained by these methods, in certain problem domains the basic scaling problem still remains. It seems that the basic problem is that difficult problems require a large number of exemplars, however efficiently each exemplar is used. The other approach grows from viewing learning and evolution as continuous with one another. On this view, the fact that networks take a long time to learn is to be expected, because we normally compare their behavior to organisms that have long evolutionary histories. Accordingly, the solution is to start the systems at places that are as pre-suited as possible for the problem domains to be learned. Shepard (1989) has argued that such an approach is critical for an appropriate understanding of the phenomena being modeled.
解决扩展问题的最终方法是模块化。有时可以将问题分解为较小的子问题,并分别训练子网络。然后可以从这些预训练模块组装更大的网络来解决原始问题。在这方面,联结主义方法的一个优点是初步训练只需要大致正确。组装后可以使用最后一轮训练来学习模块之间的接口。
A final approach to the scaling problem is through modularity. Sometimes it is possible to break a problem into smaller subproblems and train subnetworks separately on these. Larger networks can then be assembled from those pretrained modules to solve the original problem. An advantage of the connectionist approach in this regard is that the preliminary training need only be approximately right. A final round of training can be used after assembly to learn the interfaces among the modules.
最后一个被研究过的学习方面是泛化的性质。显然,网络最重要的方面不是学习一组映射,而是学习所研究样本中隐含的功能,以便对尚未观察到的情况做出适当的反应。虽然有许多成功泛化的例子(例如,Sejnowski 和 Rosenberg 在 NETtalk,1987 年学习拼写到音素的映射),但也有许多网络不能正确泛化的情况(参见 Denker 等人,1987 年)。理解这一点的一个简单方法是注意,对于大多数问题,网络中有足够的自由度,存在大量真正不同的问题解决方案——每个解决方案都构成了对未见模式的不同泛化方式。显然,并非所有这些解决方案都是正确的。
One final aspect of learning that has been looked at is the nature of generalization. It is clear that the most important aspect of networks is not that they learn a set of mappings but that they learn the function implicit in the exemplars under study in such a way that they respond properly to cases not yet observed. Although there are many examples of successful generalization (e.g., the learning of spelling-to-phoneme mappings in Sejnowski and Rosenberg’s NETtalk, 1987), there are a number of cases in which the networks do not generalize correctly (see Denker et al., 1987). One simple way to understand this is to note that for most problems there are enough degrees of freedom in the network that there are a large number of genuinely different solutions to the problems—each of which constitutes a different way of generalizing to unseen patterns. Clearly not all of these can be correct.
Weigend 和我提出了一个假设,该假设有望提高泛化能力(Weigend and Rumelhart,1991)。基本思想是:泛化问题本质上是归纳问题。给定一组观察值,适用于所有情况的适当原则是什么?请注意,任何时间点的网络都可以看作是归纳假设的规范。我们的建议是遵循奥卡姆剃刀的一个版本,选择与观察结果一致的最简单、最稳健的网络。稳健性假设只是一种连续性假设的体现,即输入模式的微小变化对输出或系统性能的影响很小。简单性假设只是在所有正确解释输入数据的网络中选择隐藏单元最少、连接最少、权重对称性最高的网络,等等。我们已经正式化了这一程序,并修改了反向传播学习程序,以便它优先选择简单、稳健的网络,并且在其他条件相同的情况下,将选择这些网络。在许多情况下,事实证明这些网络正是泛化能力最强的网络。
Weigend and I have proposed an hypothesis that shows some promise in promoting better generalization (Weigend and Rumelhart, 1991). The basic idea is this: the problem of generalization is essentially the induction problem. Given a set of observations, what is the appropriate principle that applies to all cases? Note that the network at any point in time can be viewed as a specification of an inductive hypothesis. Our proposal is that we follow a version of Occam’s razor and select the simplest, most robust network that is consistent with the observations made. The assumption of robustness is simply an embodiment of a kind of continuity assumption that small variations in the input pattern should have little effect on the output or on the performance of the system. The simplicity assumption is simply to choose—of all networks that correctly account for the input data—the net with the fewest hidden units, the fewest connections, the most symmetries among the weights, and so on. We have formalized this procedure and modified the backpropagation learning procedure so that it prefers simple, robust networks, and, all things being equal, will select those networks. In many cases it turns out that these are just the networks that do the best job generalizing.
帕特里夏·丘奇兰和特伦斯·谢诺夫斯基
Patricia Churchland and Terrence Sejnowski
1992
1992
计算神经科学是一种不断发展的方法,旨在发现神经元和神经元网络的特征和控制原理。它利用神经生物学数据和计算思想来研究神经网络如何产生复杂的效果,例如立体视觉、学习和发声物体的听觉定位。简单地说,它一只脚踏在神经科学上,一只脚踏在计算机科学上。第三只脚牢牢地扎根在实验心理学上,至少还有一只脚趾在哲学上,所以显然这项事业是多方面的。关于这一点,后面会详细介绍。
Computational neuroscience is an evolving approach that aims to discover the properties characterizing and the principles governing neurons and networks of neurons. It draws on both neurobiological data and computational ideas to investigate how neural networks can produce complex effects such as stereo vision, learning, and auditory location of sound-emitting objects. To put it crudely, it has one foot in neuroscience and one foot in computer science. A third foot is firmly planted in experimental psychology, and at least a toe is in philosophy, so evidently the enterprise is multi pedal. Of which more anon.
计算神经科学最接近的学术亲属可能是系统神经生物学,这是神经科学的一个分支,传统上关注的是同样的问题,但并没有明确地与计算机建模或公开的信息处理理论框架结盟。一个早熟的祖先被称为“控制论”,与系统神经生物学相反,它通常更倾向于工程和心理物理方面,而较少关注神经生物学方面。最近创造的“联结主义”通常指使用与真实神经网络只有表面相似的网络进行建模,而“神经网络建模”可以涵盖广泛的项目。具有讽刺意味的是,“神经网络建模”通常与高度人工的非神经网络的计算机建模有关,通常主要具有技术意义,例如急诊室的医疗诊断。1 “PDP”(“并行分布式处理”)通常是认知心理学家和一些计算机科学家偏爱的标签,他们试图模拟面部识别和语言学习等高级活动,而不是视觉运动检测或水蛭的防御性弯曲等低级活动。
Probably the closest academic kin of computational neuroscience is systems neurobiology, a branch of neuroscience that traditionally has focused on much the same set of problems, but did not explicitly ally itself with computer modeling or with an avowedly information-processing framework for theories. A precocious ancestor went by the name of “cybernetics,” which, inversely to systems neurobiology, generally leaned more heavily on the engineering and psychophysical sides, and more lightly on the neurobiological side. Coined more recently, “connectionism” usually refers to modeling with networks that bear only superficial similarities to real neural networks, while “neural net modeling” can cover a broad range of projects. Ironically perhaps, “neural net modeling” is usually identified with computer modeling of highly artificial nonneuronal networks, often with mainly technological significance such as medical diagnoses in emergency wards.1 “PDP” (“parallel distributed processing”) is generally the preferred label of cognitive psychologists and some computer scientists who seek to model rather high-level activity such as face recognition and language learning rather than lower-level activity such as visual motion detection or defensive bending in the leech.
正如我们使用的术语“计算神经科学”旨在实现神经网络计算模型的生物现实主义,尽管在此过程中,可能会使用相当简化和人工的模型来帮助测试和探索计算原理。学术上的花园规划是一种滑稽而不精确的交易,因为胡萝卜经常和萝卜混在一起,而萝卜又和土豆混在一起。我们每个人(PSC 和 TJS)都曾从母学科进入神经科学领域,因此我们绝对不是要对学术上的“跨领域”表示不满。相反,我们认为神经科学、计算机科学和心理学之间学科界限的模糊是一种健康的发展,值得明智地鼓励。无论如何,也许粗略的调查将有助于让新手——甚至是老手——了解“网络”游戏中体现的目标、策略和偏见的集群。
As we use the term, “computational neuroscience” aims for biological realism in computational models of neural networks, though en route, rather simplified and artificial models may be used to help test and explore computational principles. Academic garden-plotting is a comically imprecise trade because the carrots regularly wander in with turnips and the turnips with the potatoes. Each of us (P.S.C. and T.J.S.) is cheerfully guilty of wandering into neuroscience from his mother discipline, so we emphatically do not mean to tut-tut academic “cross-fielding.” On the contrary, we view the blurring of the disciplinary boundaries between neuroscience, computer science, and psychology as a healthy development to be wisely encouraged. In any case, perhaps a crude survey will help orient the greenhorn—or even the old hand—to the clustering of goals, tactics, and prejudices manifest in the “network” game.
计算神经科学中的“计算”一词反映了计算机作为研究工具在建模复杂系统(如网络、神经节和大脑)中的作用。从这个意义上讲,这个词也可以指计算天文学或计算地质学。然而,在目前的语境中,这个词的主要力量是它的描述性内涵,在这里它表明了一种根深蒂固的信念,即计算机建模本身就是一种计算机,尽管它与计算机科学最初使用的串行数字机器截然不同。也就是说,神经系统和神经系统的一部分本身就是自然进化的计算机——有机构成、模拟表示和并行处理架构。它们代表了世界上的特征和关系,使动物能够适应其环境。它们是一种计算机,其运作方式我们仍不得而知,但可以说是计算神经科学的母矿。
The expression “computational” in computational neuroscience reflects the role of the computer as a research tool in modeling complex systems such as networks, ganglia, and brains. Using the word in that sense, one could have also computational astronomy or computational geology. In the present context, however, the word’s primary force is its descriptive connotation, which here betokens the deep-seated conviction that what is being modeled by a computer is itself a kind of computer, albeit one quite unlike the serial, digital machines on which computer science cut its teeth. That is, nervous systems and probably parts of nervous systems are themselves naturally evolved computers—organically constituted, analog in representation, and parallel in their processing architecture. They represent features and relations in the world and they enable an animal to adapt to its circumstances. They are a breed of computer whose modus operandi still elude us but are the mother lode, so to speak, of computational neuroscience.
关于神经系统计算,有许多广泛的线索。首先,与通用且可以编程运行任何算法的数字计算机不同,大脑似乎是专用系统的相互连接的集合,这些系统在执行任务时非常高效,但灵活性有限。例如,视觉皮层似乎无法承担小脑或海马体的功能。据推测,这并不是因为视觉皮层包含的细胞本质上和内在地是视觉的(或包含“视觉”而不是“听觉”),而主要是因为它们的形态特化以及它们在视觉皮层细胞系统中的位置,即相对于它们的输入细胞、它们的皮层内和皮层下连接、它们的输出细胞等。换句话说,神经元的特化是神经元在系统中的计算角色的函数,而进化已经改进了细胞以更好地执行这些角色。
A number of broad clues about computation in nervous systems are available. First, unlike a digital computer which is general purpose and can be programmed to run any algorithm, the brain appears to be an interconnected collection of special-purpose systems that are very efficient at performing their tasks but limited in their flexibility. Visual cortex, for example, does not appear able to assume the functions of the cerebellum or the hippocampus. Presumably this is not because visual cortex contains cells that are essentially and intrinsically visual in what they do (or contain “visons” instead of “auditons”), but rather it is mainly because of their morphological specialization and of their place in the system of cells in visual cortex, i.e., relative to their input cells, their intracortical and subcortical connections, their output cells, and so on. Put another way, a neuron’s specialization is a function of the neuron’s computational roles in the system, and evolution has refined the cells better to perform those roles.
其次,通过研究大脑的微观结构和组织,可以收集到有关大脑计算原理的线索,而这些线索对于弄清大脑的计算组织是必不可少的,因为神经系统是进化的产物,而不是工程设计的产物。进化的修改总是在已经存在的组织和架构的背景下进行的。很简单,大自然并不是一个聪明的工程师。它不能拆除现有的配置,从头开始,采用首选的设计或首选的材料。它不能仔细考虑环境条件并构建一个最佳设备。因此,大自然进化出的计算解决方案可能与聪明的人类发明的解决方案大不相同,而且从正统的工程假设来看,它们可能既不是最优的,也不是可预测的。
Second, the clues about the brain’s computational principles that can be gleaned from studying its microstructure and organization are indispensable to figuring out its computational organization because the nervous system is a product of evolution, not engineering design. Evolutionary modifications are always made within the context of an organization and architecture that are already in place. Quite simply, Nature is not an intelligent engineer. It cannot dismantle the existing configuration and start from scratch with a preferred design or preferred materials. It cannot mull the environmental conditions and construct an optimal device. Consequently, the computational solutions evolved by Nature may be quite unlike those that an intelligent human would invent, and they may well be neither optimal nor predictable from orthodox engineering assumptions.
第三,人类神经系统绝不是专门的认知设备,尽管对认知的迷恋会培养出一种默认的倾向。神经系统还必须管理诸如体温调节(哺乳动物的一项非常复杂的功能)、生长、生殖、呼吸、饥饿、口渴的调节以及运动控制和行为状态的维持等问题,例如睡眠、做梦、清醒等。因此,导致视觉计算改进的进化修改似乎具有工程奖获奖者的特征。但是,如果它不能与大脑组织的其他部分相融合,或者如果它边缘化了体温调节等关键功能,动物及其“获奖”视觉基因就会死亡。鉴于这些原因,逆向工程(将设备拆开以查看其工作原理)对于大脑来说是一种有利可图的策略。相比之下,完全基于合理的工程设计原理的纯粹先验方法可能会让我们走上死胡同。
Third, human nervous systems are by no means exclusively cognitive devices, though the infatuation with cognition fosters a tacit tendency to assume so. Nervous systems must also manage such matters as thermoregulation—a very complex function for mammals—growth, aspects of reproduction, respiration, regulation of hunger, thirst, and motor control and maintenance of behavioral state, such as sleeping, dreaming, being awake, and so forth. Thus an evolutionary modification that results in a computational improvement in vision, say, might seem to have the earmarks of an engineering prizewinner. But if it cannot mesh with the rest of the brain’s organization, or if it marginalizes critical functions such as thermoregulation, the animal and its “prizewinning” vision genes will die. Given these reasons, reverse engineering, where the device is taken apart to see how it works, is a profitable strategy with respect to the brain. By contrast, a purely a priori approach, based entirely on reasonable principles of engineering design, may lead us down a blind alley.
第四,我们应该意识到,我们对这些问题最喜爱的直觉可能会产生误导,无论它们是多么“不言而喻”和令人信服。更具体地说,神经系统正在解决的计算问题的性质,以及神经系统面临的问题的难度,都不能仅通过内省来判断。例如,考虑一下人类的自然活动,如走路——这项技能通常在生命的第一年左右掌握。人们可能会怀疑这是否是一个计算问题,或者如果是,它是否是一个足够复杂的问题,值得人们思考。由于走路几乎毫不费力,不像做代数,很多人确实觉得代数很吃力,所以人们可能会从随意的观察中得出结论,走路是一项计算上很容易的任务——至少比做代数容易。然而,认为走路在计算上相当简单的先入为主的观念只是一种错觉。玩具制造商很容易制造一个娃娃,只要孩子抱着它,它就会把一只脚放在另一只脚前面。但让娃娃像我们一样走路、保持平衡,则是一项完全不同的任务。尽管内省暗示运动很容易,但运动却是一件复杂的事情。
Fourth, it is prudent to be aware that our favorite intuitions about these matters may be misleading, however “self-evident” and compelling they be. More specifically, neither the nature of the computational problems the nervous system is solving nor the difficulty of the problems confronting the nervous system can be judged merely by introspection. Consider, for example, a natural human activity such as walking—a skill that is typically mastered in the first year or so of life. One might doubt whether this is a computational problem at all, or if it is, whether it is a problem of sufficient complexity to be worth one’s reflection. Since walking is virtually effortless, unlike, say, doing algebra, which many people do find a strain, one might conclude from casual observation that walking is a computationally easy task—easier, at least, than doing algebra. The preconception that walking is computationally rather trivial is, however, merely an illusion. It is easy enough for toy manufacturers to make a doll that puts one foot in front of the other as long as she is held by the child. But for the doll to walk as we do, maintaining balance as we do, is a completely different task. Locomotion turns out to be a complicated matter, the ease implied by introspection notwithstanding.
在计算神经科学中,另一个在生成假设时至关重要的计算问题涉及执行计算所需的时间。从神经系统的角度来看,仅仅提出对给定输入给出正确输出的解决方案是不够的。解决方案还必须在问题出现后的几毫秒内可用,应用程序必须在几百毫秒内出现。重要的是,神经系统能够在一秒钟内定期检测信号、识别模式并收集响应。神经系统适当而迅速地移动其包裹体的能力通常在进化的每个阶段都是被选择的,因为总的来说,自然选择会青睐那些能够逃跑或与捕食者搏斗、捕捉和藏匿猎物的生物。在其他条件相同的情况下,缓慢的神经系统会成为更快神经系统的晚餐。即使大脑使用的计算策略最终不是优雅或美观的,而是具有某种进化的自我品质,它们显然非常快。这种微小的响应时间使得许多表面上优雅的计算架构和巧妙的计算原理都因太慢而无法实现。考虑到电子计算机中的事件发生在纳秒 (10 −9 ) 范围内,而神经元中的事件发生在毫秒 (10 −3 ) 范围内,这一点就显得更加重要。
Another computational issue of critical importance in generating hypotheses in computational neuroscience concerns the time available for performing the computation. From the point of view of the nervous system, it is not enough to come up with solutions that merely give the correct output for a given input. The solutions must also be available within milliseconds of the problem’s presentation, and applications must be forthcoming within a few hundred milliseconds. It is important that nervous systems can routinely detect signals, recognize patterns, and assemble responses within one second. The ability of nervous systems to move their encasing bodies appropriately and swiftly was typically selected at every stage of evolution, since by and large natural selection would favor those organisms that could flee or fight predators, and catch and cache prey. Ceteris paribus, slow nervous systems become dinner for faster nervous systems. Even if the computational strategies used by the brain should turn out not to be elegant or beautiful but to have a sort of evolutionary do-it-yourself quality, they are demonstrably very fast. This tiny response-time rules out as just too slow many kinds of ostensibly elegant computational architectures and clever computational principles. This point is all the more significant when it is considered that events in an electronic computer happen in the nanosecond (10−9) range, whereas events in neurons happen in the millisecond (10−3) range.
相关的考虑是,有机计算机(如大脑)的基本要素(细胞体、树突、轴突、神经胶质细胞和血管)的可用空间有限,而脑容量又受到生殖机制的限制。例如,在哺乳动物中,母亲盆腔的大小限制了后代的头部大小,从而限制了后代的大脑大小。这一切意味着神经系统中线路的长度也必须有限——进化不能仅仅依靠无限长的连接线,而必须充分利用每一厘米。例如,在人脑中,线路总长度约为 10 8米,必须塞入约 1.5 升的体积中。身体上感觉器官和肌肉的空间结构以及传入和传出系统的相对位置也与神经系统进化过程中选择的计算类型有关。大脑节省线路的一种策略是映射处理单元,以便相邻单元处理相似的表示。另一种策略涉及共享线路,这意味着同一线路(轴突)可用于编码大量表示(Mead,1989)。因此,神经系统采用的计算类型不仅受时间因素的限制,还受空间因素的限制。
A related consideration is that organic computers such as brains are constrained in the amount of space available for the essential elements—cell bodies, dendrites, axons, glial cells, and vascularization—and the cranial capacity is in turn limited by the mechanisms of reproduction. In mammals, for example, the size of the pelvic cavity of the mother constrains head size of offspring, and therefore brain size of offspring. What this all means is that the length of wiring in nervous systems must also be limited—evolution cannot just help itself to indefinite lengths of connecting wire but must make every centimeter count. In a human brain, for example, the total length of wiring is about 108 meters and it has to be packed into a volume of about 1.5 liters. The spatial configuration of sense organs and muscles on the body and the relative position of the afferent and efferent systems will also be relevant to the computational genre that has been selected in the evolution of nervous systems.One strategy the brain uses to economize on wire is to map the processing units so that neighboring units process similar representations. Another strategy involves sharing wire, meaning that the same wire (axon) can be used in coding a large range of representations (Mead, 1989). The computational genre adopted for a nervous system will, therefore, be constrained not only by temporal factors but also by spatial factors.
计算还受到功耗的限制,而大脑在这方面的效率也令人印象深刻。例如,一个神经元每次运算(例如,一个神经元在突触处激活另一个神经元)大约消耗 10 −15焦耳的能量。相比之下,目前最高效的硅技术每次运算(乘法、加法等)大约需要 10 −7焦耳(Mead,1989)。以每次运算焦耳为标准,大脑的能效比最好的硅芯片高出7 到 8 个数量级。能效的直接结果是,大脑每秒可以执行的运算比最新的超级计算机还要多。最快的数字计算机每秒能够执行大约 10 9 次运算;例如,普通家蝇的大脑在休息时每秒可以执行大约 10 11 次运算。
Computation is also limited by power consumption, and on this matter too the brain is impressively efficient. For example, a neuron uses roughly 10−15 joules of energy per operation (e.g., one neuron activating another at a synapse). By contrast, the most efficient silicon technology currently requires about 10−7 joules per operation (multiply, add, etc.) (Mead, 1989). Using the criterion of joules per operation, the brain is about 7 or 8 orders of magnitude more power efficient than the best of the silicon chips. A direct consequence of their energy efficiency is that brains can perform many more operations per second than even the newest supercomputers. The fastest digital computers are capable of around 109 operations per second; the brain of the common housefly, for example, performs about 1011 operations per second when merely resting.
最后,还有建筑材料的限制。也就是说,细胞是由蛋白质和脂质组成的,它们必须依靠线粒体来提供能量;神经系统必须具有生长和发育所必需的物质和倾向,并且它们必须利用诸如细胞膜特性和可用化学物质等特征才能像有机计算机一样发挥作用。此外,神经系统需要持续的氧气供应和可靠的营养供应。进化必须利用蛋白质、脂质、膜、氨基酸等来制造它所能制造的东西。这与工程学上的凑合游戏并不完全不同,在工程学上,给定的材料是有限的(有限数量的冰棍棒、橡皮筋和回形针),例如,任务是建造一座支撑重量的桥梁。事实上,约翰·奥尔曼(John Allman)(1990)曾提出,恒温动物的大脑扩张是由于需要进行激烈的捕食,以保持家中的炉火燃烧。在争夺大量燃料的竞争中,拥有复杂神经机制、能够提升捕食者和躲避捕食者能力的恒温动物将占据优势。
Finally, there are constraints imposed by the materials of construction. That is, cells are made out of proteins and lipids, they have to rely on mitochondria for their energy supply; nervous systems must have the substances and dispositions necessary for growth and development, and they must exploit such features as the membrane properties of cells and the available chemicals in order to function as an organic computer. Additionally, the nervous system needs a constant supply of oxygen and a reliable supply of nutrients. Evolution has to make what it can out of proteins, lipids, membranes, amino acids, etc. This is not altogether unlike the engineering make-do game where the given materials are limited (a finite number of popsicle sticks, rubber bands, and paper clips), and the task, for example, is to build a weight-supporting bridge. Indeed, John Allman (1990) has suggested that brain expansion in homeotherms was spurred by the need to engage in intense prey-catching in order to keep the home fires burning, as it were. In the competition for large amounts of fuel, homeotherms with sophisticated neural machinery that upgraded prey-catching and predator avoidance would have had an advantage.
两种概念构成了我们看待计算神经科学问题的大部分方式。第一个是层次的概念,第二个涉及不同层次研究的共同进化。在大脑中,既有大规模的组织,也有小规模的组织,不同的功能发生在较高和较低的层次上(图 20.1)。一种解释将解释信号如何在树突中整合;另一种解释将解释网络中神经元的相互作用,或系统中网络的相互作用。2捕捉网络学习显著特征的模型与描述 NMDA 通道的模型将有不同的面貌。然而,一个层次上的理论必须与更高和更低层次的理论相吻合,因为故事中的不一致或空白意味着某些现象被误解了。毕竟,大脑是细胞的集合,如果一种描述的神经元具有与另一种描述的相同神经元不相容的属性,那就大错特错了。
Two conceptual ideas have structured much of how we tend to conceive of problems in computational neuroscience. First is the notion of levels, and the second concerns the co-evolution of research on different levels. In the brain, there is both large-scale and small-scale organization, and different functions take place on higher and lower levels (figure 20.1). One sort of account will explain how signals are integrated in dendrites; a different account will explain the interaction of neurons in a network, or the interaction of networks in a system.2 A model that captures the salient features of learning in networks will have a different face from a model that describes the NMDA channel. Nevertheless, the theories on one level must mesh with the theories of levels both higher and lower, because an inconsistency or a lacuna somewhere in the tale means that some phenomenon has been misunderstood. After all, brains are assemblies of cells, and something would be seriously amiss if neurons under one description had properties incompatible with the same neurons under another description.
图 20.1
神经系统组织层次示意图。可识别解剖组织的空间尺度在许多数量级上变化。右侧的图标代表不同层次的结构:(顶部)视觉皮层中的视觉区域子集(van Essen 和 Maunsell,1980 年);(中间)神经节细胞如何连接到视觉皮层中的简单细胞的网络模型(Hubel 和 Wiesel,1962 年),以及(底部)化学突触(Kandel 和 Schwartz,1985 年)。(来自 Churchland 和 Sejnowski,1988 年。)
Figure 20.1
Schematic illustration of levels of organization in the nervous system. The spatial scales at which anatomical organizations can be identified varies over many orders of magnitude. Icons to the right represent structures at distinct levels: (top) a subset of visual areas in visual cortex (van Essen and Maunsell, 1980); (middle) a network model of how ganglion cells could be connected to simple cells in visual cortex (Hubel and Wiesel, 1962), and (bottom) a chemical synapse (Kandel and Schwartz, 1985). (From Churchland and Sejnowski 1988.)
有关心理现象的性质及其神经生物学基础的讨论总是会提到“层次”的概念。为了更准确地说明“层次”的含义,我们发现文献中存在三种关于层次的不同观点:分析层次、组织层次和处理层次。粗略地说,这些区别是按照以下思路进行的:组织层次本质上是解剖学上的,指的是组件的层次结构和由这些组件组成的结构。处理层次是生理学上的,指的是过程相对于传感器和肌肉的位置。分析层次是概念性的,指的是关于大脑如何执行任务的不同问题:大脑将任务划分为哪些子任务,哪些处理步骤执行子任务,哪些物理结构执行这些步骤?接下来,我们将详细阐述这些区别。
Discussions concerning the nature of psychological phenomena and their neurobiological bases invariably make reference to the notion of “levels.” In trying to be a bit more precise about what is meant by “level,” we found three different ideas about levels in the literature: levels of analysis, levels of organization, and levels of processing. Roughly speaking, the distinctions are drawn along the following lines: levels of organization are essentially anatomical, and refer to a hierarchy of components and to structures comprising these components. Levels of processing are physiological, and refer to the location of a process relative to the transducers and muscles. Levels of analysis are conceptual and refer to different kinds of questions asked about how the brain performs a task: into what subtasks does the brain divide the tasks, what processing steps execute a subtask, and what physical structures carry out the steps? In what follows, we elaborate on these distinctions.
Marr (1982) 提出的层次理论框架为在神经结构计算的背景下思考层次提供了重要且有影响力的背景。3 该框架借鉴了计算机科学中的层次概念,Marr 相应地将其分为三个层次:(1) 抽象问题分析的计算层次,将任务(例如,根据视网膜上的二维图案确定物体的三维深度)分解为其主要组成部分;(2) 算法层次,指定执行任务的正式程序,以便对于给定的输入,得到正确的输出结果;(3) 物理实现层次,使用特定技术构建工作设备。这种划分实际上对应于可以针对现象提出的三种不同类型的问题:(1) 问题如何分解成各个部分?(2) 哪些原则支配着各个部分如何相互作用以解决问题?(3) 哪些东西的因果相互作用实现了这些原则?
A framework for a theory of levels, articulated by Marr (1982), provided an important and influential background for thinking about levels in the context of computation by nervous structures.3 This framework drew upon the conception of levels in computer science, and accordingly Marr characterized three levels: (1) the computational level of abstract problem analysis, decomposing the task (e.g., determining the 3-D depth of objects from the 2-D pattern on the retina) into its main constituents; (2) the level of the algorithm, specifying a formal procedure to perform the task so that for a given input, the correct output results; and (3) the level of physical implementation, constructing a working device using a particular technology. This division really corresponds to three different sorts of questions that can be raised about a phenomenon: (1) how does the problem decompose into parts?, (2) what principles govern how the parts interact to solve the problem?, and (3) what is the stuff whose causal interactions implement the principles?
在马尔看来,一个重要的因素是,高层次的问题在很大程度上独立于其下层,因此最高层次的计算问题可以独立于理解执行计算的算法进行分析。同样,第二层次的算法问题被认为可以独立于理解其物理实现来解决。因此,他首选的策略是自上而下,而不是自下而上。至少这是官方的说法,但在实践中,向下看在马尔寻找问题分析和算法解决方案的尝试中占有重要地位。具有讽刺意味的是,鉴于他提倡自上而下的策略,马尔的工作本身受到神经生物学考虑的极大影响,实施事实限制了他对问题的选择,并培养了他的计算和算法见解。公开地,对自上而下策略的提倡确实带有这样的含义,让一些人感到沮丧,让另一些人感到欣慰,即神经生物学事实或多或少可以被忽略,因为它们毕竟只是在实施层面。
An important element in Marr’s view was that a higher-level question was largely independent of the levels below it, and hence computational problems of the highest level could be analyzed independently of understanding the algorithm which performs the computation. Similarly, the algorithmic problem of the second level was thought to be solvable independently of understanding its physical implementation. Thus his preferred strategy was top-down rather than bottom-up. At least this was the official doctrine though, in practice, downward glances figured significantly in Marr’s attempts to find problem analyses and algorithmic solutions. Ironically, given his advocacy of the top-down strategy, Marr’s work was itself highly influenced by neurobiological considerations, and implementation facts constrained his choice of problem and nurtured his computational and algorithmic insights. Publicly, the advocacy of the top-down strategy did carry the implication, dismaying for some and comforting for others, that neurobiological facts could be more or less ignored, since they were, after all, just at the implementation level.
不幸的是,独立性原则混淆了两个非常不同的问题。一个问题涉及,从发现的角度来看,人们是否能够独立于实现事实而找出相关算法和问题分析。另一个问题涉及,从形式理论的角度来看,已知可以在给定机器(例如大脑)中执行任务的给定算法是否可以在具有不同架构的其他机器中实现。就后者而言,计算理论告诉我们,算法可以在不同的机器上运行,并且从这个意义上讲,也只有从这个意义上讲,算法才独立于实现。形式观点很简单:由于算法是形式化的,因此没有特定的物理参数(例如真空管、Ca 2+)是算法的一部分。
Unfortunately, two very different issues were confused in the doctrine of independence. One concerns whether, as a matter of discovery, one can figure out the relevant algorithm and the problem analysis independently of facts about implementation. The other concerns whether, as a matter of formal theory, a given algorithm which is already known to perform a task in a given machine (e.g., the brain) can be implemented in some other machine which has a different architecture. So far as the latter is concerned, what computational theory tells us is that an algorithm can be run on different machines, and in that sense and that sense alone, the algorithm is independent of the implementation. The formal point is straightforward: since an algorithm is formal, no specific physical parameters (e.g., vacuum tubes, Ca2+) are part of the algorithm.
话虽如此,重要的是要看到,纯粹形式的观点无法解决如何最好地发现特定机器实际使用的算法的问题,也无法解决如何最好地进行神经生物学上适当的任务分析的问题。当然,它不能告诉我们,与认知功能相关的算法的发现将独立于对神经系统的详细了解。此外,它并没有告诉我们任何实现都与其他实现一样好。最好不要这样做,因为不同的实现在速度、大小、效率、优雅等方面显示出巨大的差异。一旦我们知道大脑是如何工作的,我们就可以利用算法与架构的形式独立性来构建计算等效的机器,但如果我们不知道大脑是如何工作的,它就无法指导发现。
That said, it is important to see that the purely formal point cannot speak to the issue of how best to discover the algorithm in fact used by a given machine, nor how best to arrive at the neurobiologically adequate task analysis. Certainly it cannot tell us that the discovery of the algorithms relevant to cognitive functions will be independent of a detailed understanding of the nervous system. Moreover, it does not tell us that any implementation is as good as any other. And it had better not, since different implementations display enormous differences in speed, size, efficiency, elegance, etc. The formal independence of algorithm from architecture is something we can exploit to build computationally equivalent machines once we know how the brain works, but it is no guide to discovery if we do not know how the brain works.
层次独立性问题标志着 Marr (1982) 与当前研究神经和联结主义模型的研究人员之间的重大概念差异。与独立性原则相反,当前的研究表明,实施考虑在设计的算法类型和科学家可获得的计算见解类型中起着至关重要的作用。大脑结构知识与项目无关,可以成为设计可能且强大的算法的基本基础和宝贵催化剂——这些算法可以合理地解释神经元如何工作。
The issue of independence of levels marks a major conceptual difference between Marr (1982) and the current generation of researchers studying neural and connectionist models. In contrast to the doctrine of independence, current research suggests that considerations of implementation play a vital role in the kinds of algorithms that are devised and the kind of computational insights available to the scientist. Knowledge of brain architecture, far from being irrelevant to the project, can be the essential basis and invaluable catalyst for devising likely and powerful algorithms—algorithms that have a reasonable shot at explaining how in fact the neurons do the job.
Marr 的三级划分将计算视为一种单一的分析层次。同样,实施和任务描述也被视为一种单一的分析层次。然而,当我们将 Marr 的三个分析层次与神经系统的组织层次进行比较时,发现两者的契合度很差,而且充其量也令人困惑(Crick,1979;Churchland 和 Sejnowski,1988;Shepherd,1988)。首先,存在不同规模的组织结构:分子、突触、神经元、网络、层、图和系统(图 20.2)。在每个结构指定的层次上,我们都可以提出计算问题:元素的组织起什么作用?它对大脑更广泛的计算组织有何贡献?此外,还有生理层面:离子运动、通道配置、EPSP(兴奋性突触后电位)、IPSP(抑制性突触后电位)、动作电位、诱发反应电位,以及可能还有其他我们尚未了解的中间层面,这些层面涉及网络或系统等更高解剖层面的影响。
Marr’s three-level division treats computation monolithically, as a single kind of level of analysis. Implementation and task-description are likewise each considered as a single level of analysis. Yet when we measure Marr’s three levels of analysis against levels of organization in the nervous system, the fit is poor and confusing at best (Crick, 1979; Churchland and Sejnowski, 1988; Shepherd, 1988). To begin with, there is organized structure at different scales: molecules, synapses, neurons, networks, layers, maps, and systems (figure 20.2). At each structurally specified stratum we can raise the computational question: what does that organization of elements do? What does it contribute to the wider, computational organization of the brain? In addition, there are physiological levels: ion movement, channel configurations, EPSPs (excitatory postsynaptic potentials), IPSPs (inhibitory postsynaptic potentials), action potentials, evoked response potentials, and probably other intervening levels that we have yet to learn about and that involve effects at higher anatomical levels such as networks or systems.
图 20.2
神经系统的组织级别,由 Gordon Shepherd (1988) 描述。
Figure 20.2
Levels of organization in the nervous system, as characterized by Gordon Shepherd (1988).
因此,结构组织的范围意味着存在许多实现级别,并且每个级别都有其伴随的任务描述。但是,如果任务描述的类型与结构组织的级别一样多,那么这种多样性可能会反映在描述任务完成方式的多种算法中。这反过来意味着算法级别的概念与实现级别的概念一样过于简单。
The range of structural organization implies, therefore, that there are many levels of implementation and that each has its companion task description. But if there are as many types of task descriptions as there are levels of structural organization, this diversity could be reflected in a multiplicity of algorithms that characterize how the tasks are accomplished. This in turn means that the notion of the algorithmic level is as over-simplified as the notion of the implementation level.
还要注意,根据您提出的问题,可以从计算角度(就功能角色而言)或实现角度(就功能基础而言)查看同一组织级别。例如,从远距离区域之间的通信角度来看,动作电位传播方式的细节可能被视为一种实现,因为它是一个全有或全无事件,只有其时间才携带信息。然而,从较低的结构层面(离子分布的角度来看)来看,传播动作电位是一种计算结构,其再生和重复性质是沿轴突空间分布的几种非线性电压依赖性离子通道的结果。
Note also that the very same level of organization can be viewed computationally (in terms of functional role) or implementationally (in terms of the substrate for the function), depending on what questions you ask. For example, the details of how an action potential is propagated might, from the point of view of communication between distant areas, be considered an implementation, since it is an all-or-none event and only its timing carries information. However, from a lower structural level—the point of view of ionic distributions—the propagating action potential is a computational construct whose regenerative and repetitive nature is a consequence of several types of nonlinear voltage-dependent ionic channels spatially distributed along an axon.
这一层次概念的重点是解剖结构与解剖结构所代表内容之间的联系。首先,它假设距离对感觉输入作出反应的细胞越远,信息处理程度就越高。因此,指定的层次等级是突触与外围距离的函数。根据这一衡量标准,大脑皮层初级视觉区中对定向光条作出反应的细胞处于比外侧膝状体 (LGN) 中的细胞更高的层次,而后者又处于比视网膜神经节细胞更高的层次。由于表征的性质和表征的转换仍然不太清楚,因此仅指相对层次(x高于或低于y),而不是序数层次(第一、第二等)。
The focus for this levels concept is the link between anatomy and what is represented in the anatomy. As a first pass, it assumes that the greater the distance from cells responding to sensory input, the higher is the degree of information processing. Thus the level-rank assigned is a function of synaptic distance from the periphery. On this measure, cells in the primary visual area of the neocortex that respond to oriented bars of light are at a higher level than cells in the lateral geniculate nucleus (LGN), which in turn are at a higher level than retinal ganglion cells. Because the nature of the representations and the transformations on the representations are still poorly understood, only the relative level—x is higher or lower than y—rather than the ordinal level—first, second, etc.—is referred to.
一旦感官信息到达大脑皮层,它就会通过皮层-皮层投射散开,形成大量并行的处理流。在灵长类动物的视觉系统中,已确定了 25 个主要或完全是视觉的区域(van Essen 等人,1991 年;图 20.3)。许多(可能所有)前向投射都与后向投射相匹配,甚至有大量从初级视觉皮层到 LGN 的反馈投射。鉴于这些相互投射,处理层次绝不是单向的阶梯。即便如此,通过检查纤维投射到的皮层层,仍可能发现信息流中的某种顺序。前向投射通常终止于皮层的中间层,而反馈投射通常终止于上层和下层(Rockland 和 Pandya,1979 年;Maunsell 和 van Essen,1983 年)。然而,到目前为止,这些反馈通路的功能尚未确定,尽管它们在学习、注意力和知觉识别中发挥作用的想法并非毫无道理。如果较高区域能够影响较低区域的信息流,那么严格的顺序处理就不能被视为理所当然。
Once the sensory information reaches the cerebral cortex, it fans out through cortico-cortical projections into a multitude of parallel streams of processing. In the primate visual system, 25 areas that are predominantly or exclusively visual have been identified (van Essen et al. 1991; figure 20.3). Many (perhaps all) forward projections are matched by a backward projection, and there are even massive feedback projections from primary visual cortex to the LGN. Given these reciprocal projections, the processing hierarchy is anything but a one-way ladder. Even so, by examining the cortical layer into which fibers project, it is possible to find some order in the information flow. Forward projections generally terminate in the middle layers of cortex, and feedback projections usually terminate in the upper and lower layers (Rockland and Pandya, 1979; Maunsell and van Essen, 1983). So far, however, the function of these feedback pathways is not established, though the idea that they have a role in learning, attention, and perceptual recognition is not unreasonable. If higher areas can affect the flow of information through lower areas, then strictly sequential processing cannot be taken for granted.
图 20.3
猕猴右半球大脑皮层的扁平投影。点画表示与视觉处理有关的皮层区域。(左上)猕猴大脑的侧面图,显示视觉区域。(左下)猕猴大脑的内侧图。(来自 van Essen 和 Anderson 1990。)
Figure 20.3
A flattened projection of the cerebral cortex in the right hemisphere of the macaque monkey. Stippling indicates cortical areas implicated in visual processing. (Upper left) Lateral view of macaque brain, showing visual areas. (Lower left) Medial view of macaque brain. (From van Essen and Anderson 1990.)
早期感觉区域的典型组织方式只是近似、粗略和不完全的层次结构。4此外,在感觉区域之外,甚至没有那么多层次结构。额叶皮层和初级感觉区域以外的其他区域的解剖结构表明,信息组织方式更像雅典民主,而不是福特装配线。层次结构通常有一个顶点,按照类比,人们可能期望找到一个大脑区域,所有感觉信息都汇聚于此,运动命令也从这里产生。令人惊讶的是,大脑的情况并非如此。虽然存在汇聚通路,但汇聚只是部分的,而且在许多地方发生多次,运动控制似乎是分散的,而不是由指挥中心控制(Arbib 1989;Altman 和 Kien 1989;图 20.4)。
The organization typical of earlier sensory areas is only approximately, roughly, and incompletely hierarchical.4 Beyond the sensory areas, moreover, not even that much hierarchy is manifest. The anatomy of frontal cortex and other areas beyond the primary sensory areas suggests an information organization more like an Athenian democracy than a Ford assembly line. Hierarchies typically have an apex, and following the analogy, one might expect to find a brain region where all sensory information converges and from which motor commands emerge. It is a striking fact that this is false of the brain. Although there are convergent pathways, the convergence is partial and occurs in many places many times over, and motor control appears to be distributed rather than vested in a command center (Arbib 1989; Altman and Kien 1989; figure 20.4).
图 20.4
昆虫神经系统中的决策模型。在 CNS 中,站点 1、2、3 包含局部网络 1、2、3。这些站点近似于蝗虫的大脑、食管下 (SOC) 和节段神经节。每个站点的输出来自该站点的输入和局部网络活动之间的一致性,因此每个站点的输出都不同。因此,这些站点以多个并行环路连接在一起,整个系统的输出是所有环路中活动的一致性。(来自 Altman 和 Kien 1989。)
Figure 20.4
Model for decision-making in the insect nervous system. In the CNS, stations 1, 2, 3 contain local networks 1, 2, 3. These stations approximate the brain, the subesophageal (SOC), and segmental ganglia of the locust. The output of each station results from a consensus between the activity of the inputs and the local networks in that station, so the output of each station is different. The stations are thus linked in several parallel loops, and the output of the whole system is the consensus of the activity in all the loops. (From Altman and Kien 1989.)
假设存在感觉处理层次,即使只是初步近似,也提供了通过将各种行为测量(例如任务相对反应时间 (RT))与细胞反应测量的不同时间发生在处理层次中的事件联系起来探索处理阶段的可能性。更粗略地说,时间顺序有助于确定什么是原因,什么是结果。可以测量不同条件下的反应准确性,人类和动物都可以作为对象。这是一种重要的方法,用于三角测量执行某项任务所涉及的大脑区域并确定有关任务处理阶段的某些信息。例如,在生理方面,可以测量移动目标的呈现与视觉区域 MT 中运动敏感细胞的第一次反应之间的延迟,在行为方面,可以测量相对于刺激中噪声程度的反应延迟。令人惊讶的是,相对于行为 RT,信号到达皮层视觉区域的延迟如此之长。 MT 的延迟约为 50-60 毫秒,颞下皮层的延迟约为 100 毫秒。由于人类对复杂物体的 RT 约为 150-200 毫秒,包括组装运动反应、将信号发送到脊髓并激活肌肉,这表明在 MT 检测和准备运动皮层、纹状体、小脑和脊髓的反应之间,介入的处理步骤出奇地少。这些数据有助于限制有关处理性质的理论。
The assumption that there is a sensory-processing hierarchy, if only to a first approximation, affords the possibility of probing the processing stages by linking various behavioral measures, such as task-relative reaction time (RT), to events taking place in the processing hierarchy at different times as measured by cellular response. To put it more crudely, temporal ordering helps determine what is cause and what is effect. Accuracy of response under varying conditions can be measured, and both humans and animals may be subjects. This is an important method for triangulating the brain areas involved in executing a certain task and for determining something about the processing stages of the task. For example, on the physiological side, one may measure the delay between the presentation of a moving target and the first response by motion-sensitive cells in visual area MT, and on the behavioral side one may measure the response latency relative to degrees of noise in the stimulus. One surprise is that the latencies for signals reaching the visual areas in the cortex are so long, relative to the behavioral RT. The latency for MT is about 50–60 msec, and about 100 msec in inferotemporal cortex. Since human RT to a complex object is on the order of 150–200 msec including assembling the motor response, sending the signal down the spinal cord, and activating the muscles, this suggests that surprisingly few processing steps intervene between detection in MT and preparing the response in the motor cortex, striatum, cerebellum, and spinal cord. Such data help constrain theories about the nature of the processing.
举例来说,请考虑 William Newsome 及其同事 (1989) 进行的一组实验,这些实验显示了 MT 中对运动检测的行为反应的准确性与对运动刺激作出反应的单个神经元的脉冲频率之间的相关性 (Newsome 等人,1989)。在任务中,微小的点在电视屏幕上随机移动。猴子被训练在检测到向右或向左的连贯运动时立即做出反应。在试验中,变化的是连贯移动的点的数量及其运动方向。猴子只需四个点连贯移动就能检测到运动方向,并且随着一起移动的点数增加,其准确性会提高。MT 中的细胞怎么样?假设一个记录来自喜欢向右运动的细胞。视觉显示设置为与细胞的接受场相匹配,结果是实验者可以控制产生最大反应所需的最小刺激。只要少于 4 个点连贯地移动,细胞就不会做出反应。随着越来越多的点在细胞偏好的方向上连贯移动,细胞的反应就会更强烈。事实上,猴子行为中显示的准确度曲线和单个细胞显示的脉冲频率曲线在第一次近似中是一致的(图 20.5)。这意味着,简单地说,单个感觉神经元的细胞反应中包含的信息和行为反应中包含的信息大致相同。然而,应该记住,猴子受过非常严格的这项任务的训练,并且选择的感觉刺激与每个神经元的最佳反应相匹配。在幼年猴子中,单个细胞的反应和明显的行为之间可能没有如此密切的对应关系。
By way of illustration, consider a set of experiments by William Newsome and colleagues (1989) in which they show a correlation between the accuracy of the behavioral response to motion detection and the spiking frequency of single neurons responding to motion stimuli in MT (Newsome et al., 1989). In the task, tiny dots move randomly on a TV screen. The monkey is trained to respond as soon as it detects coherent motion, to either the right or the left. Across trials, what varies is the number of dots moving coherently and their direction of motion. The monkey detects direction of motion with as few as four dots moving coherently, and his accuracy improves as the number of dots moving together increases. What about the cells in MT? Suppose one records from a cell that prefers right-going motion. The visual display is set up so that it is matched to the cell’s receptive field, with the result that the experimenter has control of the minimum stimulus needed to produce the maximum response. So long as fewer than four dots move coherently, the cell does not respond. With increasing numbers of dots moving coherently in the cell’s preferred direction, the cell responds more vigorously. Indeed, the accuracy curve displayed in the monkey’s behavior and the spiking-frequency curve displayed by the single cell are, to a first approximation, congruent (figure 20.5). This implies, to put it crudely, that the information contained in the cellular responses of single sensory neurons and the information contained in the behavioral response are roughly on par. It should, however, be kept in mind that the monkeys were very highly trained on this task and that the sensory stimulus was chosen to match the optimal response of each neuron. In a naive monkey, there may not be such close correspondence between the response of the single cell and the overt behavior.
图 20.5
(A) 方向选择性神经元(在视觉区域 MT 中)在跨越生理阈值的三种不同运动相关性下的响应。阴影条表示对神经元首选方向的运动的响应;实心条表示对与首选方向相反 180 ° 的运动的响应。在三个相关性水平的每个方向上进行了 60 次试验。一系列相关性水平的响应分布用于计算“神经测量”函数,该函数表征神经元对运动信号的敏感性。并且可以与根据猴子的行为反应计算出的心理测量函数进行比较。 (B) 同时记录的心理测量和神经测量函数的比较。空心圆表示猴子的心理物理表现;实心圆表示神经元的表现。每个相关性下的心理物理表现由猴子正确识别运动方向的试验比例给出。神经元表现是根据方向敏感的 MT 神经元的反应分布计算得出的。生理和心理物理数据形成了相似的曲线,但神经元的数据位于猴子数据的左侧,这意味着神经元比猴子更敏感。(摘自 Newsome 等人 1989 年的文章。经Nature 341: 52-54 许可转载。版权所有 © 1989 Macmillan Magazines Ltd.)
Figure 20.5
(A) Responses of a directionally selective neuron (in visual area MT) at three different motion correlations spanning physiological threshold. Hatched bars represent responses to motion in the neuron’s preferred direction; solid bars indicate responses to motion 180° opposite to the preferred direction. Sixty trials were performed in each direction for each of the three correlation levels. Response distributions for a range of correlation levels were used to compute a “neurometric” function that characterized the neuron’s sensitivity to the motion signal. and could be compared with the psychometric function computed from the monkey’s behavioral response. (B) Comparison of simultaneously recorded psychometric and neurometric functions. Opens circles, psychophysical performance of the monkey; filled circles, performance of the neuron. Psychophysical performance at each correlation is given by the proportion of trials on which the monkey correctly identified the direction of motion. Neuronal performance is calculated from distributions of responses of the directionally sensitive MT neuron. The physiological and psychophysical data form similar curves, but the data for the neuron lie to the left of the data for the monkey, meaning that the neuron was somewhat more sensitive than the monkey. (From Newsome et al. 1989. Reprinted by permission from Nature 341: 52-54. Copyright © 1989 Macmillan Magazines Ltd.)
实验的下一阶段将测试 MT 中方向选择性细胞携带的信息是否真的用于产生反应。为此,Newsome 及其同事提供了向左的视觉刺激,并在适当的潜伏期对包含偏好向右视觉刺激的细胞的柱状体进行电刺激。动物会如何表现?电刺激是否会通过至少有时覆盖视觉刺激来显示其有效性?猴子表现得好像它看到了向右的刺激;更确切地说,电刺激降低了动物对视觉刺激做出反应的可能性,并增加了它像看到相反方向的刺激一样做出反应的可能性。这一结果表明,细胞的反应(以及这些反应中所携带的信息)具有行为意义。
The next phase of the experiment tests whether the information carried by directionally selective cells found in MT is really used in generating the response. To do this, Newsome and colleagues presented left-going visual stimuli, and at the proper latency they electrically stimulated the column containing cells preferring right-going visual stimuli. How did the animal behave? Would the electrical stimuli demonstrate its effectiveness by overriding, at least sometimes, the visual stimuli? The monkey behaved as though he saw right-going stimuli; more exactly, the electrical stimulus decreased the probability that the animal would respond to the visual stimulus and increased the probability that it would respond as though presented with a stimulus in the opposite direction. This result implies that the cells’ responses—and hence the information carried in those responses—are behaviorally significant.
在过去的一百年中,实验心理学家收集了大量有关 RT 的信息,这是神经科学家可以借鉴的宝贵数据库。因此,我们再来考虑一下 Requin 及其同事的一组研究(Requin 等人,1988 年;Riehle 和 Requin,1989 年)。在第一阶段,他们测量了猴子的 RT,任务是按照信号指示,向某个方向和一定程度弯曲手腕。基本上有三种情况:猴子是否预先提示过,如果预先提示过,提示会指示运动的方向或程度。研究发现,预先提示对 RT 有很大影响,但对运动时间的影响很小,这表明预先提示主要影响的是运动的编程和准备,而不是运动的执行速度。此外,如果预先提示指定了位置但没有指定多少,则 RT 的缩短程度要比提示指定多少但没有指定位置的情况要大。这表明,在系统知道运动方向之前,无法有效地整合有关运动程度的信息。
During the past hundred years, experimental psychologists have assembled an impressive body of RT information, and it is a valuable data base upon which neuroscientists may draw. Thus consider also a set of studies by Requin and colleagues (Requin et al., 1988; Riehle and Requin, 1989). In the first stage, they measured the monkey’s RT where the task was to make a wrist flexion in a certain direction and by a certain amount as indicated by a signal. There were basically three conditions: the monkeys were precued or not, and if they were precued, the cue indicated either the direction or the extent of the movement. Precuing was found to have a large effect on the RT but only a slight effect on the movement time, showing that precuing has its major effect on programming and preparing for the movement, rather than on the speed of execution of the movement. Additionally, if the advance cue specified where but not how much, the RT was shortened more than if the cue specified how much but not where. This suggests that information about extent of movement cannot be efficiently incorporated until the system knows the direction of the movement.
在第二阶段,Riehle 和 Requin 研究了初级运动皮层 (MI) 和运动前皮层 (PM) 中细胞的电生理特性。他们发现了与执行相关的神经元(在 MI 中更常见),以及与准备相关的方向选择性神经元(在 PM 中更常见)。这与其他生理数据一致,意味着 PM 可能涉及比 MI 更早的处理阶段,因为 PM 与准备运动有关,而不是与执行运动有关。此外,在 PM 中的准备相关细胞类别中,他们发现了两个子类:与编程肌肉运动相关的子类,以及与预处理运动程序的一般组成部分相关的子类。这是另一个研究实例,它通过建立行为反应时间并将这些数据与细胞的特定反应相关联,缩小了关于处理的相对顺序和涉及处理不同方面的结构的假设。5
In the second stage, Riehle and Requin investigated the electrophysiological properties of cells in the primary motor cortex (MI) and the premotor cortex (PM). They found execution-related neurons, which were more common in MI, and preparation-related, directionally selective neurons, which were more common in PM. This coheres with other physiological data, and implies that PM probably involves an earlier stage of processing than does MI, since PM has more to do with preparing for the movement than with executing it. Moreover, within the class of preparation-related cells in PM, they found two subclasses: those related to programming the muscle movements, and those related to preprocessing the general components of the movement program. This is another instance of research that narrows down hypotheses about relative order of processing and the structures involved in a distinct aspect of processing by establishing behavioral reaction times and by correlating those data with specific responses of cells.5
什么是计算?根据什么事物是计算机?为什么我们会说计算尺是计算机,而打蛋器不是?这些在某种程度上是计算机科学的哲学问题,因为它们询问的是研究人员在进行项目时通常会忽略的基础问题。就像其他学科的哲学问题一样(生命的本质是什么?[生物学] 物质和变化的本质是什么?[物理学和化学]),随着经验学科的成熟和理论的支撑,答案会变得更有说服力、更有意义、更相互关联。在了解原子的存在、原子如何连接以及原子的性质之前,人们根本无法对物质和变化的本质说太多。但这并不意味着人们必须什么都不说——否则,科学就无法开始。关键在于,概括学科基本思想的理论会逐渐自我引导,利用实证发现作为支持,并在长期内摆脱旧的错误观念。
What is computation? In virtue of what is something a computer? Why do we say a slide rule is a computer but an egg beater is not? These are, in a way, the philosophical questions of computer science, inasmuch as they query foundational issues that are typically glossed over as researchers get on with their projects. Like the philosophical questions of other disciplines (What is the nature of life? [Biology] What is the nature of substance and change? [Physics and Chemistry]), the answers become more convincing, meaningful, and interconnected as the empirical discipline matures and gives more ballast to the theory. In advance of understanding that there are atoms, how atoms link together, and what their properties are, one simply cannot say a whole lot about the nature of substance and change. It is not, however, that one must say nothing—in that event, one could not get the science started. The point rather is that the theory outlining the elementary ideas of the discipline gradually bootstraps itself up, using empirical discoveries as support, and kicking away old misconceptions in the long haul.
计算的定义并不比光、温度或力场的定义更明确。虽然在现阶段,我们当然可以粗略地说出一些内容,而且说得有用,但不能指望它精确和完整。这主要是因为我们对计算还有很多不了解的地方。特别要注意的是,一旦我们更多地了解了计算机的神经系统是什么,以及它们如何做它们所做的一切,我们就会对计算和表示有更广泛和更深入的理解。还要注意,我们并不是从零开始。早期的研究,尤其是图灵(1937 年、1950 年)、冯·诺依曼(1951 年、1952 年)、罗森布拉特(1961 年)以及麦卡洛克和皮茨(1943 年)的研究,在计算理论和科学方面取得了重要进展。串行数字计算机及其上运行的智能软件的技术发展伴随着对商业计算类型的富有成效的理论探究。6
The definition of computation is no more given to us than were the definitions of light, temperature, or force field. While some rough-hewn things can, of course, be said, and usefully said, at this stage, precision and completeness cannot be expected. And that is essentially because there is a lot we do not yet know about computation. Notice in particular that once we understand more about what sort of computers nervous systems are, and how they do whatever it is they do, we shall have an enlarged and deeper understanding of what it is to compute and represent. Notice also that we are not starting from ground zero. Earlier work, especially by Turing (1937, 1950), von Neumann (1951, 1952), Rosenblatt (1961), and McCulloch and Pitts (1943), made important advances in the theory and science of computation. The technological development of serial, digital computers and clever software to run on them was accompanied by productive theoretical inquiry into what sort of business computation is.6
既然我们同意无法给出精确的定义,那么我们能否对开头的问题给出粗略的答案呢?首先,虽然我们可以以串行数字计算机为例,但“计算机”的概念比这更广泛。将计算机等同于串行数字计算机既不合理也不具有启发性,更有见地的策略是将传统数字计算机视为一个特殊实例,而不是定义原型。其次,从最一般的意义上讲,当物理系统的物理状态可以看作是代表其他系统的状态时,我们可以将物理系统视为计算系统,其中其状态之间的转换可以解释为对表示的操作。最简单的思考方式是将其视为系统状态与所表示状态之间的映射。也就是说,物理系统是计算系统,前提是系统的物理状态与计算函数的元素之间存在适当的(揭示性的)映射。这个“简单”的提议需要大量解构。
Agreeing that precise definitions are not forthcoming, can we nonetheless give rough and ready answers to the opening questions? First, although we may be guided by the example of a serial digital computer, the notion of “computer” is broader than that. Identifying computers with serial digital computers is neither justified nor edifying, and a more insightful strategy will be to see the conventional digital computer as only a special instance, not as the defining archetype. Second, in the most general sense, we can consider a physical system as a computational system when its physical states can be seen as representing states of some other systems, where transitions between its states can be explained as operations on the representations. The simplest way to think of this is in terms of a mapping between the system’s states and the states of whatever is represented. That is, the physical system is a computational system just in case there is an appropriate (revealing) mapping between the system’s physical states and the elements of the function computed. This “simple” proposal needs quite a lot of unpacking.
由于这个关于什么使物理系统成为计算系统的假设可能不是不言而喻的,让我们更逐步地探讨这个问题,首先介绍几个关键但简单的数学概念,包括“函数”以及可计算函数和不可计算函数之间的区别。首先,什么是函数?数学意义上的函数本质上只是一个集合元素(称为“定义域”)与另一个集合元素(通常称为“值域”)之间的 1:1 或多:1映射(图 20.6)。因此,函数是一组有序对,其中对的第一个成员来自定义域,第二个元素来自值域。因此,可计算函数是一种可以根据某些规则指定的映射,通常以对第一个元素执行何种操作才能得到第二个元素为特征。例如,将第一个元素乘以 2,{(1, 2), (2, 4), (3, 6)},可以用代数形式表示为y = 2 x;将定义域中的元素乘以自身 {(6.2, 38.44), (9.6, 92.16)},可以用代数形式表示为y = x 2,依此类推。
Since this hypothesis concerning what makes a physical system a computational system may not be self-evident, let us approach the issue more gradually by first introducing several key but simple mathematical concepts, including “function,” and the distinction between computable and noncomputable functions. To begin, what is a function? A function in the mathematical sense is essentially just a mapping, either 1:1 or many:1, between the elements of one set, called the “domain,” and the elements of another, usually referred to as the “range”7 (figure 20.6). Consequently, a function is a set of ordered pairs, where the first member of the pair is drawn from the domain, and the second element is drawn from the range. A computable function then is a mapping that can be specified in terms of some rule or other, and is generally characterized in terms of what you have to do to the first element to get the second. For example, multiply the first by 2, {(1, 2), (2, 4), (3, 6)}, expressible algebraically as y = 2x; multiply the element from the domain by itself {(6.2, 38.44), (9.6, 92.16)}, expressible algebraically as y = x2, and so on.
图 20.6
各种物理系统都可以实现域和范围之间的映射。映射分为三个步骤:(1)将输入数据编码为适合物理系统的形式(电路中的电信号、神经元中的化学浓度、计算尺中滑块的位置)。(2)物理系统转变为新状态。(3)解码物理系统的输出状态以产生映射结果。此处显示的示例是“平均”映射,它取四个值并得出它们的平均值。这种映射可能作为视觉系统的一部分很有用。映射也可以从时间序列的域中进行,范围可以是输出值的序列。
Figure 20.6
Mapping between a domain and a range can be accomplished by a variety of physical systems. There are three steps: (1) The input data is coded into a form appropriate for the physical system (electrical signal in an electrical circuit, chemical concentration in neuron, position of a slider in a slide rule). (2) The physical system shifts into a new state. (3) The output state of the physical system is decoded to produce the result of the mapping. The example shown here is the “average” map that takes four values and produces their average. Such a mapping might be useful as part of a visual system. Mappings could also be made from the domain of temporal sequences, and the range could be a sequence of output values.
那么什么是非可计算函数呢?它是一组无限的有序对,目前,甚至在原则上,都无法提供任何规则。因此,它的规范简单而精确地包含在有序对列表中。例如,如果元素是随机关联的,则不存在任何规则来指定域元素和值域元素之间的映射。在数学之外,人们相当合理地倾向于将“函数”等同于“可计算函数”,因此认为非规则映射根本不是函数。但事实上,数学家们并不是这样使用这些术语的,而且理由充分,因为使用非可计算函数的概念来描述某些映射很有用。此外,它对于当前的问题很有用,因为这是一个经验问题,即大脑活动是否真的可以用可计算函数来表征,还是只能用第一近似来表征,或者也许它的某些活动根本无法用可计算函数来表征(Penrose,1989)。
What then is a noncomputable function? It is an infinite set of ordered pairs for which no rule can be provided, not only now, but in principle. Hence its specification consists simply and exactly in the list of ordered pairs. For example, if the elements are randomly associated, then no rule exists to specify the mapping between elements of the domain and elements of the range. Outside of mathematics, people quite reasonably tend to equate “function” with “computable function,” and hence to consider a nonrule mapping to be no function at all. But this is not in fact how mathematicians use the terms, and for good reason, since it is useful to have the notion of a noncomputable function to describe certain mappings. Moreover, it is useful for the issue at hand because it is an empirical question whether brain activity can really be characterized by a computable function or only to a first approximation, or perhaps whether some of its activities cannot be characterized at all in terms of computable functions (Penrose, 1989).
什么是线性函数?直观地说,线性函数是有序对元素的图绘制成一条直线的函数。非线性函数是图绘制不成直线的函数(图 20.7)。因此,当大脑功能被描述为“非线性”时,这意味着 (a) 活动以可计算函数为特征,并且 (b) 该函数是非线性的。还要注意,绘制函数的空间可能是二维空间(x 轴和y轴),但它当然可以有两个以上的维度(例如,x轴、y轴,以及w、v、z等轴)。
What is a linear function? Intuitively, it is one where the plot of the elements of the ordered pair yields a straight line. A nonlinear function is one where the plot does not yield a straight line (figure 20.7). Thus when brain function is described as “nonlinear,” what this means is that (a) the activity is characterized by a computable function, and (b) that function is nonlinear. Notice also that the space in which functions are plotted may be a two-dimensional space (the x and y axes), but it may, of course, have more than two dimensions (e.g., an x axis, y axis, and also w, v, z, etc. axes).
图 20.7函数F ( x
)的示例,沿垂直轴绘制,一个变量x沿水平轴绘制。函数A是线性函数。函数B是非线性函数。函数C是不连续函数。
Figure 20.7
Examples of functions F(x), plotted along the vertical axis, of one variable, x, plotted along the horizontal axis. Function A is a linear function. Function B is a nonlinear function. Function C is a discontinuous function.
因为向量的概念可以极大地简化讨论,所以我们在这里介绍它。向量只是一组有序的数字。例如,某公司三位副总裁 1990 年的收入集合可以用向量< $30, $10, $10 > 表示;五只母鸡每周产的鸡蛋可以用向量< 4, 6, 1, 0, 7 > 表示;每秒四个神经元的脉冲频率可以用向量< 10, 55, 44, 6 > 表示。相比之下,标量是单个值而不是多值集合。当我们想要根据顺序敏感规则对集合中的值进行操作时,集合中的顺序很重要。系统(包括神经系统)执行进行向量到向量映射的功能。例如,从拉伸感受器的值到肌肉收缩值,或从头部速度值到眼球速度值。
Because the notion of a vector simplifies discussion enormously, we introduce it here. A vector is just an ordered set of numbers. For example, the set of incomes for 1990 of three vice-presidents in a corporation can be represented by the vector <$30, $10, $10 >; the eggs laid per week by five hens as <4, 6, 1, 0, 7 >; the spiking frequency of four neurons/sec as <10, 55, 44, 6 >. By contrast, a scalar is a single value rather than a many-valued set. The order in the set matters when we want to operate on the values in the set according to an order-sensitive rule. Systems, including the nervous system, execute functions that perform vector-to-vector mapping. For example, from the stretch receptors’ values to the muscle contraction values, or from the head velocity values to eye velocity values.
这些概念的几何表达增加了它们的价值。任何坐标系都定义了一个状态空间,轴的数量将取决于所包含的维数。状态空间是所有可能向量的集合。例如,患者的体温和舒张压可以表示为二维状态空间中的位置。或者,如果网络有三个单元,则可以认为每个单元都定义了三维空间中的一个轴。单元在某一时刻的活动是其轴上的一个点,因此网络中所有单元的全局激活都由该三维空间中的一个点指定(图 20.8)。更一般地,如果网络有n 个单元,那么它就定义了一个n维激活空间,激活向量可以表示为该状态空间中的一个点。向量序列可以表示为状态空间中的轨迹。8因此,患者的体温和血压随时间的变化会形成二维空间中的轨迹。函数将一个状态空间中的点映射到另一个状态空间中的点——例如,从拉伸感受器激活空间中的点映射到肌梭激活空间中的点。
A geometric articulation of these concepts compounds their value. Any coordinate system defines a state space, and the number of axes will be a function of the number of dimensions included. A state space is the set of all possible vectors. For example, a patient’s body temperature and diastolic blood pressure can be represented as a position in a 2-D state space. Or, if a network has three units, each unit may be considered to define an axis in a 3-D space. The activity of a unit at a time is a point along its axis, so that the global activation of all the units in the net is specified by a point in that 3-D space (figure 20.8). More generally, if a network has n units, then it defines an n-dimensional activation space, and an activation vector can be represented as a point in that state space. A sequence of vectors can be represented as a trajectory in the state space.8 Thus the patient’s body temperature and blood pressure followed through time results in a trajectory in a 2-space. A function maps a point in one state space to a point in another state space—for example, from a point in stretch-receptor activation space to a point in muscle spindle activation space.
图 20.8
三神经元系统通过状态空间的轨迹示意图。系统的状态是一个三维向量,其分量是三个神经元的放电率。随着放电率随时间变化,向量的尖端描绘出一条轨迹(粗线)。神经元越多,状态空间的维度就越高。
Figure 20.8
Schematic diagram of the trajectory of a three-neuron system through state space. The state of the system is a 3-D vector whose components are the firing rates of the three neurons. As the firing rates change with time, the tip of the vector traces out a trajectory (thick line). For more neurons the state space will have a higher dimension.
这些概念——“向量”和“状态空间”——是线性代数的一部分,它们实际上是理解模型网络所需的数学核心。它们在概念上非常简单,而且它们相当直观地可以从容易可视化的二维情况扩展到非常复杂的n维情况,其中n可能是数千或数百万。虽然关于线性代数的主题可以写出更多的书,但这也许足以让模型神经网络的讨论变得轻松。9
These notions—“vector” and “state space”—are part of linear algebra, and they are really the core of the mathematics needed to understand model networks. They are mercifully simple conceptually, and they are rather intuitively extendable from easily visualizable 2-D cases to very complex, n-D cases, where n may be thousands or millions. Although volumes more can be written on the topic of linear algebra, this is perhaps enough to ease the entry into the discussion of model neural networks.9
数学插曲旨在提供一个通用词汇,以便我们能够回到描述物理系统如何使其成为计算机的问题上,尽管只是粗略地描述。为了解决数学插曲中悬而未决的问题,我们假设当 (1) 系统状态到f的参数和值之间存在系统映射,并且 (2 )中间状态序列执行该函数的算法时,物理系统会计算某个函数f。10非正式地说,算法是一种有限的、确定性的过程,例如制作姜饼的配方或求平方根的规则。
The mathematical interlude was intended to provide a common vocabulary so that we might return to the question of characterizing, albeit roughly, what about a physical system makes it a computer. To pick up the thread left hanging during the mathematical interlude, let us hypothesize that a physical system computes some function f when (1) there is a systematic mapping from states of the system onto the arguments and values of f, and (2) the sequence of intermediate states executes an algorithm for the function.10 Informally, an algorithm is a finite, deterministic procedure, e.g., a recipe for making gingerbread or a rule for finding the square root.
我们将某物视为计算机,因为,并且仅当其输入和输出能够有用且系统地解释为表示我们感兴趣的某个函数的有序对时。因此,该标准包含两个部分:(1)客观问题,即哪些函数描述系统的行为;(2)主观和实际问题,即我们是否关心该函数是什么。这意味着,划定计算机类别并非纯粹的经验问题,因此“计算机”不是自然类型,就像“电子”、“蛋白质”或“哺乳动物”是自然类型一样。对于确实划定自然类型的类别,实验对于确定某项是否真正属于该类别很重要。此外,类别中的项目存在概括和定律(自然定律),并且存在交织在定律中的理论。非自然类型在所有这些方面都不同,并且通常具有兴趣相关维度。
We count something as a computer because, and only when, its inputs and outputs can usefully and systematically be interpreted as representing the ordered pairs of some function that interests us. Thus there are two components to this criterion: (1) the objective matter of what function(s) describe the behavior of the system, and (2) the subjective and practical matter of whether we care what the function is. This means that delimiting the class of computers is not a sheerly empirical matter, and hence that “computer” is not a natural kind, in the way that, for example, “electron” or “protein” or “mammal” is a natural kind. For categories that do delimit natural kinds, experiments are relevant in deciding whether an item really belongs to the category. Moreover, there are generalizations and laws (natural laws) about the items in the categories and there are theories interleaving the laws. Nonnatural kinds differ in all these respects, and typically have an interest-relative dimension.
例如,“蜜蜂”是一种自然类,但“宝石”和“杂草”不是。物体被视为宝石取决于某些社会群体是否赋予它们特殊价值,通常是作为地位的象征。植物被视为杂草取决于该地区的园丁(认真的园丁?)是否喜欢在花园里种植它们。一些园丁把满天星当作理想的植物来培育;其他园丁则把它当作杂草来对抗。没有实验可以确定满天星是否真的是一种杂草,因为没有事实依据——只有社会或特殊的惯例。11类似地,我们认为,对于所有计算机来说,并不存在必要且充分的内在属性,只有兴趣相关属性,即有人认为将系统的状态解释为代表其他系统的状态很有价值,而系统的属性支持这种解释。桌面冯·诺依曼机器的存在正是因为我们对我们构建并编程执行的功能非常感兴趣,因此兴趣相关组件是不可或缺的。由于这个原因,也因为这些机器如此常见,它们是计算机的原型,就像蒲公英是杂草的原型一样。然而,这些原型不应该被误认为是类别本身。
“Bee,” for example, is a natural kind, but “gem” and “weed” are not. Objects are considered gems depending on whether some social group puts special value on them, typically as status symbols. Plants are considered weeds depending on whether gardeners (serious gardeners?) in the region happen to like having them in the garden. Some gardeners cultivate baby’s breath as a desirable plant; other gardeners fight it as a weed. There is no experiment that will determine whether baby’s breath is really a weed or not, because there is no fact of the matter—only social or idiosyncratic conventions.11 Similarly, we suggest, there is no instrinsic property necessary and sufficient for all computers, just the interest-relative property that someone sees value in interpreting a system’s states as representing states of some other system, and the properties of the system support such an interpretation. Desktop von Neumann machines exist precisely because we are keenly interested in the functions we build and program them to execute, so the interest-relative component is dyed in the wool. For this reason, and because these machines are so common, they are the prototypical computers, just as dandelions are prototypical weeds. These prototypes should not, however, he mistaken for the category itself.
有人可能会批评这种对计算的非常笼统的描述,认为它太笼统了。因为在这个非常广泛的意义上,即使是筛子或脱粒机也可以被视为计算机,因为它们将输入分类,如果人们愿意花时间研究它,就可以发现一个描述输入输出行为的函数。虽然这个观察是正确的,但它与其说是一种批评,不如说是对这个概念广度的恰当评价。这就像一个草坪种植完美主义者难以置信地指出,根据我们对“杂草”的理解,即使是蒲公英,相对于某些气候和某些种植者部落来说,也可能不是杂草。因此,它们确实可能是一些农民的经济作物。这也不是空想。栽培的蒲公英叶现在出现在蔬菜水果特产区,是一种美味佳肴。
It may be suggested as a criticism of this very general characterization of computation that it is too general. For in this very wide sense, even a sieve or a threshing machine could he considered a computer, since they sort their inputs into types, and if one wanted to spend the time at it, one could discover a function that describes the input–output behavior. While this observation is correct, it is not so much a criticism as an apt appreciation of the breadth of the notion. It is rather like a lawn-growing perfectionist incredulously pointing out that on our understanding of “weed,” even dandelions might be nonweeds relative to some clime and some tribe of growers. And so, indeed, they might be some farmer’s cash crop. Nor is this idle fancy. Cultivated dandelion greens now appear as a delicacy in the specialty section of the greengrocery.
可以想象,如果有人有理由关心筛子和脱粒机的输入输出行为所反映的特定功能,那么它们可以被解释为计算机,尽管很难看出这些理由是什么。与专为其计算能力而设计的台式计算机不同,筛子和脱粒机是出于其他原因而制造的,即它们根据大小和形状对物体进行分类的纯粹机械能力。然而,不应过分强调有目的的设计与计算机用途之间的联系,因为一块形状偶然的岩石可以用作日晷。这是一个真正简单的计算机作品,但我们确实有理由关心其阴影投射状态可以解释为代表的时间状态。
Conceivably, sieves and threshing machines could be construed as computers if anyone has reason to care about the specific function reflected in their input–output behavior, though it is hard to see what those reasons might be.Unlike desktop computers that are engineered precisely for their computational prowess, sieves and threshing machines are constructed for other reasons, namely their sheerly mechanical prowess in the sorting of objects according to size and shape. Not too much emphasis should be placed on the link between purposeful design and use as a computer, however, for a fortuitously shaped rock can be used as a sundial. This is a truly simple computer-trouvé, but we do have reason to care about the temporal states that its shadow-casting states can be interpreted as representing.
尽管如此,批评背后或许存在正确的直觉。找到一个足够有趣的设备来保证“计算机”的描述可能还意味着它的输入输出功能相当复杂和不明显,因此发现该功能可以揭示一些关于设备的真实性质及其工作原理的重要且可能出乎意料的信息。因此,找出筛子计算的内容可能不是很有趣,也不会教会我们很多我们以前不知道的知识。筛子的工作原理非常简单。相比之下,找出小脑计算的内容将教会我们很多关于组织的性质及其工作原理的知识。
There is perhaps a correct intuition behind the criticism nonetheless. Finding a device sufficiently interesting to warrant the description “computer” probably also entails that its input–output function is rather complex and inobvious, so that discovering the function reveals something important and perhaps unexpected about the real nature of the device and how it works. Thus finding out what is computed by a sieve is probably not very interesting and will not teach us much we did not already know. How a sieve works is dead simple. In contrast, finding out what is computed by the cerebellum will teach us a lot about the nature of the tissue and how it works.
计算机是一种物理设备,具有物理状态和因果相互作用,从而导致这些状态之间的转换。基本上,它的某些物理状态被安排成代表某种东西,而它的状态转换可以解释为对这些表示的计算操作。计算尺用于计算(例如,(乘以 2, 7)得到 14 作为输出),因为它的物理规律是以遵守数字领域的抽象规律的方式设置的;巨石阵的奥布里洞系统计算日食,因为它的物理组织和状态转换是这样的,即太阳石、月亮石和节点石在日食发生时恰好落入同一个洞中。请注意,即使在极不可能的情况下,巨石阵是山体滑坡和洪水的偶然产物,而不是人类的发明,情况也是如此。
A computer is a physical device with physical states and causal interactions resulting in transitions between those states. Basically, certain of its physical states are arranged such that they represent something, and its state transitions can be interpreted as computational operations on those representations. A slide rule is taken to compute—for example, (Mult 2, 7) to give 14 as the output—by dint of the fact that its physical regularities are set up in such a way as to honor the abstract regularities in the domain of numbers; the system of Aubrey holes at Stonehenge computes eclipses of the sun by dint of the fact that its physical organization and state transitions are set up so that the sun stone, moon stone, and nodal stone land in the same hole exactly when an eclipse of the sun occurs. Notice that this would be so even in the highly unlikely event that Stonehenge was the fortuitous product of landslides and flooding rather than human contrivance.
神经系统也是具有因果相互作用并构成状态转换的物理装置。通过缓慢的进化,而不是神奇的偶然或智能设计,神经系统的状态被配置成代表外部世界、神经系统所栖息的身体,在某些情况下,也代表神经系统本身的某些部分,而神经系统的物理状态转换则执行计算。哺乳动物脑干中的一种回路进化出了根据头部角速度计算眼球的下一个位置的功能。简而言之,源自半规管的神经元活动代表头部速度,而中间神经元、运动神经元和眼球肌肉的物理排列使得当头部速度达到一定值时,神经元会发生因果相互作用,从而使眼球肌肉的张力改变量恰好达到补偿头部运动所需的程度。粗略地说,这种组织是“为了”这项任务而进化的;严格地说,这种回路是通过随机突变和自然选择变成现在这个样子的;在标准的表观遗传情况下,相对于祖先的神经系统和系统的其他组成部分,这种组织在一定程度上增强了生物体生存和繁殖的机会。
Nervous systems are also physical devices with causal interactions that constitute state transitions. Through slow evolution, rather than miraculous chance or intelligent design, they are configured so that their states represent—the external world, the body they inhabit, and in some instances, parts of the nervous system itself—and their physical state transitions execute computations. A circuit in mammalian brain stem evolved to compute the next position of the eyeball based on the angular velocity of the head. Briefly, the neuronal activity originating in the semicircular canals represents head velocity, and the interneurons, motor neurons and eyeball muscles are physically arranged such that for head velocity of a certain amount, the neurons causally interact so that the muscles of eyeball change tension by exactly the amount needed to compensate for the head movement. Loosely speaking, this organization evolved “for” this task; a little more strictly speaking, this circuit came to be the way it is by random mutations and natural selection; in standard epigenetic circumstances and relative to the ancestor’s nervous system and to the system’s other components, this organization enhances somewhat the organism’s chances of surviving and reproducing.
人造计算机和生物计算机之间存在着巨大的差异。由于我们自己构建数字计算机,因此我们会在其设计中构建适当的关系。因此,我们倾向于将这种映射视为计算机中理所当然的事情,无论是人造计算机还是进化计算机。但对于神经系统的结构,必须发现这些关系。对于生物计算机,发现可能非常困难,因为我们通常不知道结构正在计算什么,而直观的民间想法可能会产生误导。
There is a major contrast between manufactured and biological computers. Since we construct digital computers ourselves, we build the appropriate relationship into their design. Consequently, we tend to take this mapping for granted in computers generally, both manufactured and evolved. But for structures in the nervous system, these relationships have to be discovered. In the case of biological computers, discovery may turn out to be very difficult since we typically do not know what is being computed by a structure, and intuitive folk ideas may be misleading.
与我们通常称为计算机的系统相比,某些设备的运作方式是这样的,纯粹的因果解释就足够了,无需参考任何已计算或表示的内容。例如,捕鼠器或筛子是一种简单的机械装置。纯粹的因果解释可能也足以解释大脑活动的某些方面,例如神经元膜中的离子泵将钠泵出细胞,或者神经化学物质与受体结合的方式改变细胞的内部化学性质。但请记住,即使在这个级别,离子(例如 Na +)也可以表示速度之类的变量。在现阶段,没有人真正相信事实确实如此,但不能排除这种可能性,因为离子是非常低级的实体。更高组织级别的效应似乎需要用计算和表示来解释。在这里,即使界限仍然相当清晰,纯粹的因果关系也只能给出非常不令人满意的解释。例如,对于树突信号整合的纯粹因果关系或机械解释,对于细胞正在获取什么信息以及它如何处理这些信息,是没有启发性的。我们需要知道这种相互作用对于活动模式代表什么以及系统正在计算什么意味着什么。
By contrast with systems we conventionally call computers, the modus operandi of some devices are such that a purely causal explanation, without reference to anything having been computed or represented, will suffice. A mouse-trap or a sieve, for example, is a simple mechanical device. Purely causal explanations will likely suffice for some aspects of brain activity too, such as the ion pump in neuronal membranes by virtue of which sodium is pumped out of the cell or the manner in which binding of neurochemicals to receptors changes the internal chemistry of the cell. Bear in mind, however, that even at this level an ion, such as Na+, could represent a variable like velocity. At this stage, no one is really convinced that this is in fact so, but the possibility is not ruled out simply because ions are very low-level entities. Effects at higher levels of organization appear to require explanations in terms of computations and representations. Here a purely causal story, even if the line is still fairly clean, would give only a very unsatisfying explanation. For example, a purely causal or mechanical explanation of the integration of signals by dendrites is unenlightening with respect to what information the cell is getting and what it does with it. We need to know what this interaction means in terms of what the patterns of activity represent and what the system is computing.
例如,考虑顶叶皮层中的神经元,其行为可以解释为计算以头部为中心的坐标,将刺激在视网膜上的位置和眼球在头部的位置作为输入(Zipser 和 Andersen,1988)。了解某些神经元具有导致其他神经元以某种方式响应的响应特征可能很有用,尤其是在测试计算假设时,但它本身并不能告诉我们这些神经元在动物视觉能力中的作用。我们还需要知道神经元的各种状态代表什么,以及如何通过神经相互作用将这些表示转换为其他表示。在网络层面,有些例子中,网络中神经元的连接性和生理学细节仍然留下许多悬而未决的原因,而结合生理细节的计算方法可能会与任务、解决方案、环境利基和进化历史的更广泛的大脑景观联系起来。12
Consider, for example, the neurons in parietal cortex whose behavior can be explained as computing head-centered coordinates, taking positions of the stimulus on the retina and position of the eyeball in the head as input (Zipser and Andersen, 1988). Knowing that some neurons have a response profile that causes other neurons to respond in a certain way may be useful, especially in testing the computational hypothesis, but on its own it does not tell us anything much about the role of those neurons in the animal’s visual capacity. We need additionally to know what the various states of neurons represent, and how such representations can be transformed by neural interactions into other representations. At the network level, there are examples where the details of connectivity and physiology of the neurons in the network still leave many of the whys and wherefores dangling, while a computational approach that incorporates the physiological details may make contact with the broader brainscape of tasks, solutions, environmental niche, and evolutionary history.12
“功能”还有一种非数学的意义,即某物所执行的工作被称为其功能。从这种意义上讲,心脏的功能是作为一个泵,而不是作为一个发声器来安抚母乳中的婴儿。尽管发出“咔哒”的声音是心脏的功能,并且婴儿似乎被这种声音所安抚,但这肯定不是心脏的功能,大致不是它的“主要工作”。可以在进化发展的背景下合理地进行功能分配,包括动物生存和繁殖所需的条件、其环境利基以及将功能分配给相关结构后什么是合理的。在功能的这种“工作”意义上,神经系统某些部分的功能是计算某些功能(在数学意义上),比如给定头部速度时眼球的位置。
There is a nonmathematical sense of “function,” according to which the job performed by something is said to be its function. In this sense, the heart is said to function as a pump, rather, than say as a noisemaker to soothe babies on their mother’s breast. Though making a “ka-thump” sound is something the heart does, and though babies appear to be soothed by it, this surely is not the heart’s function, meaning, roughly, its “primary job.” Functional assignments can reasonably be made in the context of evolutionary development, what the animal needs to survive and reproduce, its environmental niche, and what would make sense given the assignment of function to related structures. In this “job” sense of function, the function of some part of the nervous system is to compute some function (in the mathematical sense), such as position for the eyeball given head velocity.
将生物结构描述为具有特定功能并没有什么神秘之处,即使上帝和人类在设计该结构时都没有目的。13目的论的陷阱只是目的论的陷阱,目的论在进化框架中是可以消除或简化的,没有剩余。将计算角色分配给电路就是指定该电路的工作——例如检测头部速度。因此,与确定肝脏等器官的工作有关的考虑也与将计算角色分配给神经元结构有关。神经系统进化了,适应不良的结构往往会在进化竞赛中被淘汰,这限制了许多功能假设——在“功能”的两个意义上——这些假设在逻辑上是可能的,但在生物学上是不合理的。问题的关键在于,许多与生物学无关的计算假设可以通过神经系统的一般功能事实来剔除,即它们除其他外有助于动物在世界上适应性地移动。14
There is nothing mystical about characterizing a biological structure as having a specific function, even though neither god nor man designed the structure with a purpose in mind.13 The teleological trappings are only that, and the teleology is eliminable or reducible without remainder in an evolutionary framework. To assign a computational role to a circuit is to specify a job of that circuit—detecting head velocity, for example. Consequently, the considerations that bear on determining the job of an organ such as the liver bear also on the assignment of computational role to neuronal structures. That the nervous system evolved, and that maladaptive structures tend to be weeded out in the evolutionary contest, restricts many functional hypotheses—in both senses of “functional”—that are logically possible but just not biologically reasonable. The crux of the matter is that many biologically irrelevant computational hypotheses can be culled out by a general functional truth about nervous systems, namely that inter alia they serve to help the animal move adaptively in the world.14
1. William Baxt 开发了一种网络,用于诊断出现急性前胸痛的患者冠状动脉闭塞(Baxt,1990)。
1. William Baxt has developed a net for diagnosis of coronary occlusion in patients presenting with acute anterior chest pain (Baxt, 1990).
2.这些层次是在一个大脑中假设的。然而,正如 Horace Barlow 所建议的(个人通信),如果层次图能够识别大脑之间的相互作用并在最高层次上增加一个步骤,它可能会更准确。支持扩展的理由是,大脑之间的相互作用是决定单个大脑能做什么和做什么的主要因素。自然选择是大脑之间相互作用的一种形式,它决定了特定物种大脑的一般能力和倾向。此外,考虑到个体在出生时所具有的倾向,同种大脑之间以及异种大脑之间的相互作用与个体大脑获得的特定能力、技能、知识和表征有很大关系。例如,人类学习的特定语言可能会赋予他权力或限制他,无论是在他如何与其他人互动、如何解决某些类型的问题,还是在他如何思考事情(即使没有明确交谈时)。狗与主人的互动会对动物的性情产生深远的影响,反之亦然;在有虎鲸的环境中长大的海豹和在没有虎鲸的环境中长大的海豹很可能拥有不同配置的“捕食空间”。虽然我们同意 Barlow 的观点,即社会层面非常重要,但这并不是本书的重点。
2. These levels have been postulated within a single brain. However, as Horace Barlow has suggested (personal communication), the levels diagram might be more accurate if it recognized the interaction between brains and added a step to the topmost level. The argument for the extension is that the interaction between brains is a major factor in what an individual brain can and does do. Natural selection is one form of interaction between brains that gives rise to the general capacities and predispositions of the brain of a given species. Additionally, given the predispositions an individual has at birth, interactions between conspecific brains as well as between contraspecific brains have much to do with the particular capacities, skills, knowledge, and representations an individual brain acquires. For example, the particular language a human learns may empower or limit him, both in how he interacts with other humans, how he solves certain kinds of problems, and how he thinks about things even when not explicitly conversing. A dog’s interactions with its human owner can have a profound effect on the animal’s temperment, and perhaps vice versa; seals growing up in an environment with killer whales and those without killer whales are likely to have a differently configured “predator space.” Although we agree with Barlow that the social level is very important, it has not been the main focus of this book.
3.分析层次的最初概念可以在 Marr 和 Poggio (1976, 1977) 的著作中找到。虽然 Marr (1982) 强调了计算层次的重要性,但层次结构的概念源自 Reichardt 和 Poggio (1976) 早期关于果蝇视觉控制方向的研究。从某种意义上说,当前对层次间相互作用的看法与其说是背离了早期的观点,不如说是回归了 Reichardt、Poggio 甚至 Marr 本人先前确立的做法,后者发表了一系列关于小脑皮层和大脑皮层神经网络模型的论文(例如,参见 Marr 1969, 1970)。然而,对计算层次的强调对当前一代神经和联结主义模型所涉及问题和议题产生了重要影响(Sejnowski 等人,1988 年)。
3. The original conception of levels of analysis can be found in Marr and Poggio (1976, 1977). While Marr (1982) emphasized the importance of the computational level, the notion of a hierarchy of levels grew out of earlier work by Reichardt and Poggio (1976) on the visual control of orientation in the fly. In a sense, the current view on the interaction between levels is not so much a departure from the earlier views as a return to the practice previously established by Reichardt, Poggio, and even by Marr himself, who published a series of papers on neural network models of the cerebellar cortex and cerebral cortex (see, for example, Marr 1969, 1970). The emphasis on the computational level has nonetheless had an important influence on the problems and issues that concern the current generation of neural and connectionist models (Sejnowski et al., 1988).
4.有关破坏存在整齐的处理层次结构这一观点的数据,请特别参阅 Malpeli (1983);Malpeli 等人 (1986);Mignard 和 Malpeli (1991)。
4. For data that undermine the idea that there is a neat processing hierarchy, see especially Malpeli (1983); Malpeli et al. (1986); Mignard and Malpeli (1991).
5.有关其他反应时间实验的一小部分样本,另请参阅 Cooper 和 Shepard (1973);Posner (1978);Eriksen 和 St. James (1986);Rosenbaum 等人 (1987);Treisman (1988)。
5. For a small sample of other reaction time experiments, see also Cooper and Shepard (1973); Posner (1978); Eriksen and St. James (1986); Rosenbaum et al. (1987); Treisman (1988).
6 . 具体来说,我们希望消除这样的抱怨:在制定出计算和计算机的确切定义之前,计算神经科学只是一个必须停止发展的伪科学。有人可能会说,在没有明确规定的必要和充分条件的情况下,我们不知道我们的意思,因此我们也不知道我们在说什么。但这种推理是错误的。与任何科学一样,计算神经科学始于原始的想法和粗略的结果,它可以利用其发现来修改和重新定义其基本概念,而修改后的概念可以作为下一次研究尝试的平台。承认这种理论和实验的共同进化并不是要玷污精确性,因为基于真实证据和理论连贯性的精确性是值得高度重视的。但这是为了抵制虚假的精确性,这种精确性在科学开始决定这个定义和几十种替代方案之前就拼凑出了一个人为的定义。它拥抱了科学哲学家和科学史学家逐渐意识到的一点:科学本质上引领定义,而不是追随定义。(丹尼尔·丹尼特巧妙地将在获得足够数据之前急于做出定义的现象称为“过早定义的痛苦”。)它拥抱了许多认知心理学家和语言学家怀疑的一点:我们通常通过展示此类好例子来学习概念,包括相当抽象的概念。然后,我们根据进一步的经验和语用学扩大应用范围。(参见 Lakoff 1987;PM Churchland 1989。)
6. Specifically, what we wish to defuse is the gripe that until exact definitions of computation and computer are formulated, computational neuroscience is a mere pretender that must stop dead in its tracks. In the absence of precisely specified necessary and sufficient conditions, it might be argued, we do not know what we mean, and hence we do not know what we are talking about. But this reasoning is fallacious. Like any science, computational neuroscience begins with primitive ideas and crude results, and it can use its discoveries to revise and redefine its basic concepts, where the revised concepts can serve as a platform for the next research foray. To recognize this sort of co-evolution of theory and experiment is not to sully precision, for precision based on a genuine basis of evidence and theoretical coherence is greatly to be prized. But it is to resist phony precision, precision that cobbles together a contrived definition before the science can begin to decide between this definition and dozens of alternatives. It is to embrace what philosophers and historians of science have slowly come to realize: that the science essentially leads, rather than follows, definitions. (Daniel Dennett wittily dubs the eagerness to forge definitions in advance of adequate data as the “heartbreak of premature definition.”) And it is to embrace what many cognitive psychologists and linguists suspect: that typically we learn concepts, including rather abstract concepts, by being presented with good examples of the type. We then expand the range of application as a function of further experience and pragmatics. (See Lakoff 1987; P. M. Churchland. 1989.)
7.通常不被视为函数的映射是 1:多映射。
7. The one mapping that is generally not considered a function is a 1:many mapping.
8.另一种方法是将时间作为轴之一,以便状态沿着状态-时间轨迹演变,类似于物理学中的时空轨迹。
8. An alternative is to include time as one of the axes so that the state evolves along a state-time trajectory, in analogy with space-time trajectories in physics.
9.另请参阅 Jordan (1986b)。
9. See also Jordan (1986b).
10.有关进一步的讨论,请参阅 Cummins 和 Schwarz (1991)。
10. For further discussion, see Cummins and Schwarz (1991).
11 . 即使是“疾病”这个看似自然的概念,也确实有利益相关成分。有关医学史上的一篇杰出论文“手淫疾病:价值观和疾病概念”,请参阅 Tristram Engelhardt (1974)。
11. Even the notion of “disease,” which may seem to be a natural kind, does have an interest relative component. For a remarkable paper in the history of medicine, “The disease of masturbation: values and the concept of disease,” see Tristram Engelhardt (1974).
12.确定网络计算哪个函数以及其状态代表什么绝非易事……这里需要缓解的一个担忧是,可能存在无法解决的歧义,原因很简单,计算假设总是无法根据证据确定。换句话说,对于任何给定的计算假设,相对于同一组证据,其他假设也是可能的(尽管有时只是勉强可能) 。例如,人们通常假设连接两个数据点的线与该线的早期段是连续的,但这只是假设,而不是刻板的必然性。我们认为,不确定性并不是绝望的原因。至少,它并不比将功能分配给生物系统中的其他结构(例如心脏和肾脏)更成问题。证据对假设的不确定性是一个相当哲学的考虑,从轻松的意义上讲,它是一个“现在不用担心”的问题。最终,这个问题需要得到解决,因为它令人费解且不简单,但事实上,大多数研究都进展顺利,无需对这个问题进行深入思考。
12. Determining which function a network computes and what its states represent are by no means straightforward tasks…One worry to allay here is that there can be unresolvable ambiguities for the simple reason that the computational hypothesis is always underdetermined by the evidence. In other words, for any given computational hypothesis, others are also possible (though sometimes just barely possible) relative to the same body of evidence. For example, one typically assumes that the line connecting two data points is continuous with earlier segments of the line, but that is an assumption, not a hidebound necessity. The underdetermination is not, we submit, cause for despair. At least, it is no more problematic than is the assignment of functions to other structures in biological systems, such as hearts and kidneys. Underdeterminacy of hypotheses by the evidence is a rather philosophical consideration, in the lighthearted sense that it is a “don’t-worry-now” problem. Ultimately it needs to be addressed because it is puzzling and nontrivial, but in fact most research proceeds perfectly well without paralyzing reflection on this problem.
13.关于这一点的讨论,参见Millikan (1984) 和Mitchell (1989)。
13. For discussion on this point, see Millikan (1984) and Mitchell (1989).
14.如果某些神经元通过通路与光传感器相连,并且如果它们对视野中的事件有选择性地做出反应,那么就可以推测它们的工作是处理有关外部世界的视觉信息。即使一位天赋异禀的数学家能找到一个完全不同的函数来同样容纳神经元数据,比如说一个函数来计算皇后夏洛特岛西侧麦克法兰湾的涨潮和退潮,这也只是巧合,纯粹偶然的巧合,就像某人背上恰好有雀斑,可以理解为第十诫。
14. If certain neurons are connected by pathways to transducers for light, and if they respond selectively to events in the visual field, there is a presumption in favor of supposing their job is to process visual information about the outside world. Even if a strangely gifted mathematician could find a completely different function which equally accommodated the neuronal data, say a function that computes the high and low tides in MacFarhlane Cove on the west side of Queen Charlotte Island, this is merely a coincidental, a sheerly fortuitous concurrence, like someone happening to have freckles on her back that can be read as the Tenth Commandment.
Fiona Cowie 和 James Woodward
Fiona Cowie and James Woodward
2003
2003
当然,我们的思想和大脑是通过自然选择进化而来的!它们不是神灵干预或外星人制造的结果。它们也不是漂移或任何其他自然选择替代的产物。自然选择深刻地“塑造”了思想和大脑,这一点得到了进化心理学家和几乎所有最激烈的批评者的认可。
Of course our minds and brains evolved by natural selection! They aren’t the result of divine intervention or fabrication by space aliens. Nor are they solely products of drift or any other naturalistic alternative to selection. That natural selection profoundly “shaped” the mind and brain is accepted by both by evolutionary psychologists and virtually all of their most vigorous critics.
那么,围绕进化心理学(以下简称“EP”)的争论中存在什么问题呢?首先,人们对进化心理学特有的研究策略可能带来的智力回报存在分歧。进化心理学采用“逆向工程”方法:研究人员(i)注意到一些能力或行为,(ii)推测这是解决我们生活在树上或草原上的祖先所面临的一些“适应性问题”的方法,然后(iii)提出自然选择设计了一种专门的心理机制或“模块”来产生这种能力或行为。一些进化心理学研究人员还为所提出的模块提供了(iv)行为或心理证据,但正如我们将看到的,这些证据很少令人信服,而且其他相关证据(例如来自神经生物学、遗传学或发育生物学)通常不会被引用。像我们一样,进化心理学的批评者认为这种方法不太可能产生太多见解。
What, then, is at issue in the debate surrounding evolutionary psychology (hereafter, “EP”)? First, there are disagreements about the likely intellectual payoffs of EP’s characteristic research strategy. EP employs a ‘reverse engineering’ methodology: the researcher (i) notes some competence or behavior, (ii) conjectures that it is a solution to some ‘adaptive problem’ faced by our tree- or savanna-dwelling ancestors, and then (iii) proposes that natural selection engineered a specialized psychological mechanism or ‘module’ to produce that competence or behavior. Some EP researchers also offer (iv) behavioral or psychological evidence for the proposed module, but, as we shall see, this evidence is rarely compelling, and other relevant evidence (from, e.g., neurobiology, genetics, or developmental biology) is often not cited. Critics of EP, like us, think that this methodology is unlikely to yield much insight.
我们还对 EP 关于人类心智结构、其发展方式以及进化与心理结构之间关系的观点持不同意见。进化心理学家声称心智是“大规模模块化的”。它由各种或多或少独立的“器官”组成,每个器官都专门用于执行特定类型的任务,并且每个器官都以很大程度上由基因决定的方式发展。EP 的大规模心理模块化假设不仅仅是一个无可争议的观点,即心智/大脑在某种分析层面上由根据独特原则运作的组件组成。正如我们在 §21.4 中讨论的那样,EP 赋予其模块许多额外的属性,例如信息封装(§21.4.3)和独立进化能力(§21.4.4)。此外,EP 还对我们拥有哪些模块做出了具体声明。因此,EP 中讨论的模块不是小组神经元,而是“作弊者检测”等高级认知任务所依据的复杂处理结构。
We also dispute EP’s views about the structure of the human mind, the way in which it develops, and the relation between evolution and mental architecture. Evolutionary psychologists claim that the mind is ‘massively modular.’ It is composed of a variety of more or less independent ‘organs,’ each of which is devoted to the performance of a particular kind of task, and each of which develops in a largely genetically-determined manner. EP’s hypothesis of massive mental modularity is not just the uncontroversial idea that the mind/brain consists, at some level of analysis, of components that operate according to distinctive principles. For as we discuss in §21.4, EP endows its modules with a number of additional properties such as informational encapsulation (§21.4.3) and independent evolvability (§21.4.4). In addition, EP also makes specific claims about which modules we have. Thus, the modules at issue in EP are not, e.g., small groups of neurons, but are rather the complex processing structures that underlie high-level cognitive tasks like ‘cheater detection.’
EP 关于心理结构和发展的观点是由两个非常普遍的进化考虑所驱动的。首先,EP 认为进化很可能偏爱强模块化的心理结构。其次,与此相关的是,EP 认为心理模块是自然选择的直接产物。这就要求不同的模块必须是独立进化的:它们必须具有独立的遗传基础,这样自然选择才能独立于其他模块改变一个模块。这也意味着,虽然 EP 理论家在他们的“官方”声明中小心翼翼地表示他们允许学习和其他环境影响发挥作用,但他们更详细的论证通常假设模块的发展受到遗传的严格限制。
EP’s views about mental structure and development are motivated by two very general evolutionary considerations. First, EP holds that evolution is likely to have favored strongly modular mental architectures. Second, and relatedly, EP holds that mental modules are the fairly direct products of natural selection. This requires that the different modules must be independently evolvable: they must have independent genetic bases so that natural selection can act to change one module independently of the others. It also means that while EP theorists are careful to say in their ‘official’ pronouncements that they allow a role for learning and other environmental influences, their more detailed arguments typically assume that the development of modules is tightly genetically constrained.
所有这些假设都存在问题。首先,没有理由认为进化“必须”产生模块化思维。进化心理学家(例如,Cosmides 和 Tooby,1994 年;Tooby 和 Cosmides,1990 年;Carruthers,2004 年)认为,通用心理机制不会进化,因为它们太慢,需要太多背景知识和计算空间来做出生死攸关的判断。另一方面,专门的模块可以在对生物体适应性至关重要的问题上做出快速而经济的决策,因此会受到自然选择的青睐。然而,认为模块总是(甚至通常)优于通用设备是完全错误的。什么样的心理组织会受到选择的青睐完全取决于生物体所受的选择压力的细节及其遗传结构。正如 Sober (1994) 所指出的,诸如环境的变化程度、犯下各种错误的代价、在生物体中建立各种辨别能力的代价等因素,会对通用策略与更专业策略的相对适应性产生巨大影响。此外,对于生活在不稳定环境中的生物来说,快速适应(即在个体的一生中)不断变化的环境的能力至关重要(Maynard Smith 等人,1985 年)。事实上,有证据表明,早期原始人类所居住的物理环境(特别是气候环境)和社会环境都极不稳定(Potts,1996 年、1998 年;Allman,1999 年)。因此,会有相当大的选择压力有利于认知机制的进化,这种机制允许快速吸收新信息和行为灵活性,而不是天生指定的模块。(有关此问题的更多信息,请参阅下文第 21.3 节和 Woodward 和 Cowie,2005 年)
There are problems with all of these assumptions. First, there is no reason to think that evolution ‘must’ produce modular minds. Evolutionary psychologists (e.g., Cosmides and Tooby, 1994; Tooby and Cosmides, 1990; Carruthers, 2004) argue that general-purpose psychological mechanisms would not have evolved because they are too slow and require too much background knowledge and computational space for the making of life-or-death judgments. Specialized modules, on the other hand, deliver fast and economical decisions on matters crucially affecting an organism’s fitness, so would have been preferred by natural selection. However, it is simply wrong to suppose that modules are invariably (or even usually) superior to general-purpose devices. What sorts of mental organization will be favored by selection depends entirely on the details of the selection pressures an organism is subject to and its genetic structure. As Sober (1994) shows, such factors as how variable the environment is, the costs of making various sorts of mistakes, the costs of building various sorts of discriminative abilities into the organism etc., can have large effects on the relative fitnesses of general-purpose vs. more specialized strategies. In addition, the ability to adapt quickly (i.e., within the individual’s lifetime) to changing circumstances is vital for organisms inhabiting unstable environments (Maynard Smith et al., 1985). Indeed, there is evidence that both the physical (in particular, climatic) and social environments inhabited by early hominids were highly unstable (Potts, 1996, 1998; Allman, 1999). There thus would have been considerable selective pressure favoring the evolution of cognitive mechanisms allowing the rapid assimilation of new information and behavioral flexibility, rather than innately-specified modules. (For more on this issue, see §21.3 below, and Woodward and Cowie, 2005)
其次,EP 认为成人心智中存在的模块主要是由基因决定的(或者是学习机制的产物,而学习机制本身受基因限制,只能产生特定的模块作为输出),这与已知的经验依赖型学习和发展在塑造成熟心智方面的作用不一致。正如我们在 §21.2.2 中论证的那样,成人心智中包含的任何模块化处理机制都是从复杂的发展过程中产生的。婴儿身上存在的模块化程度较低的结构和能力与环境和基因相互作用,产生(或转化为)未被直接选择的新能力。
Secondly, EP’s view that the modules existing in the adult mind are largely genetically specified (or are the products of learning mechanisms that are themselves genetically constrained to produce a particular module as output) is inconsistent with what is known about the role of experience-dependent learning and development in shaping the mature mind. As we argue in §21.2.2, whatever modular processing mechanisms the adult mind contains emerge from a complex developmental process. Less modular structures and capacities that are present in infants interact with both with the environment and the genes to generate (or be transformed into) new competences that were not directly selected for.
第三,模块的概念本身就很不明确。正如我们在 §21.4 中展示的那样,EP 文献中采用了几种不同的(且非同延的)模块性标准。研究人员在不同的模块性概念之间来回摇摆,非法地将一种意义上的模块性证据用于其他意义上的模块性。他们还倾向于将成人思维中处理的模块性问题与与模块在发展和学习中的作用有关的完全不同的问题混为一谈。
Thirdly, the notion of a module is itself quite unclear. As we show in §21.4, there are several different (and non-coextensive) criteria for modularity employed in the EP literature. Researchers move back and forth among different notions of modularity, illicitly taking evidence for modularity in one sense to bear on modularity in other senses. They also tend to conflate issues about the modularity of processing in the adult mind with quite separate issues to do with the role of modules in development and learning.
这些不明确之处使得 EP 声称心智是“一个模块系统”有些难以评估。我们的观点(在 §21.3 中进行了辩护)是,心智不仅仅是专门模块的集合。虽然我们的心智可能确实包含某种意义上的模块,但这些结构不太可能与 EP 中假设的模块(用于欺骗者检测、配偶选择、捕食者避免等)相对应。
These unclarities make EP’s claim that the mind is ‘a system of modules’ somewhat difficult to assess. Our view, defended in §21.3, is that the mind is not just a collection of specialized modules. Although our minds probably do contain modules in some sense(s) of that term, these structures are unlikely to correspond to the modules (for cheater detection, mate selection, predator avoidance, etc.) postulated in EP.
EP 的前提是,现代人类的心理组织或多或少直接反映了原始人类为解决其物质和社会环境所造成的问题而进化的方式。因此,通过反思我们的祖先必须能够解决的任务,并假设我们的祖先能够解决这些任务的心理能力都是经过选择的,进化心理学家试图绘制出我们当前的心理组织。因为他们还假设选择为每一项“适应性任务”设计了一个专有的解决方案,进化心理学家认为现代思维是“大规模模块化的”:它包含许多为特定任务而进化的特定机制(或“模块”),并且几乎没有(如果有的话)通用心理机制。
EP is premised on the idea that modern human mental organization is a more or less direct reflection of the ways in which hominids evolved to solve the problems posed by their physical and social environments. Thus, by reflecting on the tasks our ancestors must have been able to solve, and by supposing that whatever psychological abilities enabled our ancestors to solve those tasks would have been selected for, evolutionary psychologists seek to map our current psychological organization. Because they also assume that selection engineers a proprietary solution for each of these ‘adaptive tasks,’ evolutionary psychologists see the modern mind as ‘massively modular’: it contains numerous specific mechanisms (or ‘modules’) which evolved for specific tasks and houses few (if any) general purpose psychological mechanisms.
我们已经提到过这种策略的一个问题:它忽略了灵活性对于生活在快速变化环境中的原始人类来说可能是选择性的优先考虑。在本节中,我们将讨论 EP 的适应主义或“逆向工程”方法在生成心理假设方面存在的三个问题。
One problem with this strategy has already been mentioned: it ignores the possibility that flexibility might well have been at a selective premium for hominids inhabiting rapidly changing environments. In this section, we discuss three further problems with EP’s adaptationist or ‘reverse engineering’ approach to generating psychological hypotheses.
EP 认为,由于“形式追随功能”(Tooby and Cosmides,1997,第 13 页),人们可以通过考虑心智所做的事情(或者更确切地说,我们的祖先的心智在“进化适应环境”(“EEA”)中做了什么)来了解心智是怎样的。
EP believes that since “form follows function” (Tooby and Cosmides, 1997, p. 13), one can figure out how the mind is just by considering what it does (or rather, what our ancestors’ minds did, back in the ‘environment of evolutionary adaptation’ (“EEA”)).
EP 逆向工程策略被误导的一个原因是,你不能仅从功能推断结构。相反,制定和确认功能和结构假设是高度相关的工作,结构信息可以指导功能假设,反之亦然。
One reason that EP’s reverse engineering strategy is misguided is that you can’t infer structure from function alone. Instead, formulating and confirming functional and structural hypotheses are highly interrelated endeavors, with information about structure informing hypotheses about function and vice versa.
举例来说,请考虑一下我们对人类陈述性记忆的思考在过去半个世纪中是如何演变的(参见 LeDoux,1996,第 7 章)。到 20 世纪 40 年代,神经生理学家得出结论,记忆分布在整个大脑中,而不是局限于某个特定区域。[结构假说。] 但后来出现了病人 HM,他为了治疗严重的癫痫而切除了大部分颞叶。术后,HM 记得手术前发生在他身上的大部分事情,并能形成持续几秒钟的新短期记忆。然而,他无法形成新的长期记忆。因此,HM 指出短期记忆和长期记忆是不同的 [功能假说],它们由不同的大脑系统支持 [结构假说],负责形成新的长期记忆的区域与允许存储旧记忆的区域不同 [结构假说]。在此之前,人们认为边缘系统(包括海马体和杏仁核)构成了大脑的情绪回路 [功能性]。但海马体是 HM 和其他严重记忆缺陷患者中受损最严重的区域之一 [结构性],这表明边缘系统也参与了认知功能 [功能性],并表明海马体是记忆的所在 [结构性]。随着对海马体运作的进一步研究 [结构性],人们发现它特别与空间信息的学习和记忆有关 [功能性]。此外,由于所有早期海马体记忆故事所依据的患者杏仁核也受到损伤 [结构性],这表明杏仁核也参与了记忆 [功能性]。(后一种说法仍然存在争议 [功能性],因为后来的研究表明,仅海马体损伤就会导致失忆症 [结构性]。)
As an illustration, consider how our thinking about human declarative memory has evolved over the last half century (cf. LeDoux, 1996, chapter 7). By the 1940s, neurophysiologists had concluded that memory is distributed over the whole brain, not localized in a particular region. [A structural hypothesis.] But then came the patient H.M., who had had much of both temporal lobes removed to treat severe epilepsy. Post-operatively, HM remembered much of what had happened to him prior to the surgery and could form new short-term memories lasting a few seconds. However, he was unable to form new long-term memories. HM thus indicated that short-term memory and long-term memory are distinct [a functional hypothesis], that they are supported by different brain systems [a structural hypothesis], and that the areas responsible for the formation of new long-term memories are different from those allowing storage of the old ones [structural]. Also prior to this, the limbic system (including the hippocampus and amygdala) had been thought to comprise the emotional circuitry of the brain [functional]. But the hippocampus was one of the areas that was so badly compromised in HM, and in other patients with severe memory deficits [structural], indicating that the limbic system was also involved in cognitive functioning [functional] and suggesting that the hippocampus was the seat of memory [structural]. As the workings of the hippocampus were further investigated [structural], it was found to be especially implicated in learning and memory of spatial information [functional]. Further, since all of the patients on whom the early hippocampal memory story had been based had also had damage to the amygdala [structural], this was an indication that the amygdala was also involved in memory [functional]. (This latter claim is still controversial [functional], given that later studies have shown that hippocampal lesions alone will produce amnesia [structural].)
这个小故事说明了功能观点如何(或应该如何)高度敏感于结构信息。因此,它强调了以下假设(EP 中特有的)的幼稚性:人们可以通过列举心智可以执行的任务来准确地区分心理功能。进化心理学家试图通过以下方式避免这一困难:不是(或不仅仅是)从关于我们的心智目前可以做什么的行为数据中推断功能,而是从他们关于哪些心理能力在 EEA 中被选择的想法中推断功能。因此,实际上,进化心理学家将心理功能视为生物功能(Wright,1973 年的意义):心智过去拥有的能力,由于被选择而仍然存在,而不是作为心智目前所做的功能,无论它们是否被选择(因果角色功能,Cummins,1975 年的意义)。
This vignette illustrates how views about functions are (or should be) highly sensitive to structural information. It thus underscores the naïveté of the assumption (endemic in EP) that one can accurately individuate psychological functions by enumerating the tasks that the mind can perform. Evolutionary psychologists try to avoid this difficulty by inferring functions not (or not just) from behavioral data about what our minds can do at present, but rather from their ideas about which psychological capacities were selected for back in the EEA. In effect, then, evolutionary psychologists think of psychological functions as biological functions (in the sense of Wright, 1973): capacities that the mind had in the past that are still present because they were selected for, rather than as functions in the sense of what the mind does at present, regardless of whether they were selected for (causal role functions in the sense of Cummins, 1975).
然而,从表面上看,这一举措加剧了而不是解决了刚刚讨论的问题。毕竟,如果仅凭行为证据就很难描绘出我们自己思维的功能解剖结构,那么通过推测我们祖先的行为和他们面临的选择压力来描绘他们的思维就更加困难了:生物功能通常比因果角色功能更难弄清楚。首先,正如 Lewontin 反复指出的那样(例如,1990 年),认知功能在类人猿化石记录上没有留下明确的痕迹,人类也没有近亲,他们的同源心理能力可能让人推断出祖先的功能。此外,正如 Stotz 和 Griffiths(2002 年)所言,生物体面临的进化或“适应性”问题不能独立于生物体的能力(和/或这些能力背后的结构)来指定。以他们为例,如果你不知道某种化石鸟类有强化的喙和头骨(像现代啄木鸟一样),你就无法重建它的生态位(生活在森林中)、它的习惯(吃生活在树皮下的昆虫)或它面临的适应性问题(捕捉昆虫)和选择压力。在缺乏关于心智究竟是什么样的详细知识的情况下,推测欧洲经济区原始人类面临的适应性问题就像推测无头化石鸟类的生态位和进食习惯一样。因此,EP 从对其生物功能的推测推断心智的功能结构的策略严重偏离了轨道。
Prima facie, however, this move compounds, rather than solving, the problem just discussed. After all, if it’s hard to delineate the functional anatomy of our own minds on the basis of merely behavioral evidence, it’s even harder to limn the minds of our ancestors by speculating about what they did and the selection pressures they faced: biological functions are typically tougher to figure out than causal role functions. For one thing, as Lewontin has repeatedly pointed out (e.g., 1990), cognitive functions leave no unambiguous marks on the hominid fossil record and humans have no close living relatives whose homologous psychological capacities might allow inferences about ancestral functioning. In addition, as Stotz and Griffiths (2002) argue, the evolutionary or ‘adaptive’ problems faced by an organism cannot be specified independently of the organism’s capacities (and/or the structures that underlie those capacities). If you didn’t know, to take their example, that a given fossil bird had a reinforced beak and skull (like a modern woodpecker), you would be unable to reconstruct its niche (living in a forest), its habits (eating insects living under the bark of trees) or the adaptive problems (getting at the insects) and selection pressures it faced. In the absence of detailed knowledge of what the mind is actually like, speculating about the adaptive problems faced by hominids in the EEA is like speculating about the niche and feeding habits of a headless fossil bird. Thus, EP’s strategy of inferring the mind’s functional architecture from speculations about its biological function(s) is seriously off track.
刚刚概述的认识论问题在适应论思维推理中相当普遍。然而,EP 对结构和功能之间关系的看法还存在第二个问题:EP 假设,一旦某种心理功能以某种方式被识别,那么就可以合理地假设一种单一机制(即“模块”)来执行该功能。正如 Carruthers 所说:
The epistemological problems just outlined are quite endemic to adaptationist reasoning about the mind. However, there is a second problem with EP’s view of the relation between structure and function: EP assumes that once a psychological function is somehow identified, it is legitimate to postulate a single mechanism—a ‘module’—that performs that function. As Carruthers puts it:
…一般来说,在生物学中,不同的功能预示着实现这些功能的不同机制…… [因此]人们应该期望,不同的心理功能——估计数量、预测碰撞的影响、推理他人的状态等等——很可能在不同的认知学习机制中实现…… ”(2004 年,第 300 页)
…in biology generally, distinct functions predict distinct mechanisms to fulfill those functions…[Hence] one should expect that distinct mental functions—estimating numerosity, predicting the effects of a collision, reasoning about the states of another person, and so on—are likely to be realized in distinct cognitive learning mechanisms…” (2004. p. 300)
这种“一对一”假设并不是 EP 方法论中可有可无的一部分。如果一种机制可以促进多种不同的功能,或者一种功能需要多种不同机制的配合,那么从功能推断机制的典型 EP 程序就会受到破坏。因为在这种情况下,对于执行给定功能的机制会有许多不同的替代假设,而功能本身的识别并不能提供关于这些替代假设中哪一个是正确的证据。一对一假设通过假设唯一的可能性是不同的机制执行每个功能来避免这种困难。
This ‘one to one’ assumption is not a dispensable part of EP methodology. If a single mechanism could subserve many different functions or if a single function required the cooperation of a number of different mechanisms, then the characteristic EP procedure of inferring mechanisms from functions would be undermined. For in that case, there would be many different alternative hypotheses about the mechanisms involved in the performance a given function, and the identification of the function itself would provide no evidence about which of these alternatives was correct. The one-to-one hypothesis avoids this difficulty by assuming that the only possibility is that a distinct mechanism performs each function.
鉴于“一对一”假设在 EP 方法论中发挥的核心作用,因此,对于 EP 来说,这一假设体现了对自然选择如何运作的严重误解,这是一个真正的问题。自然选择并非“通过将新的特殊用途项目‘强加’到现有库中来运作”(Carruthers,2004,第 300 页),而是通常通过临时拼凑现有项目来执行新任务,而不是(或补充)旧任务。羽毛最初是为了调节体温而进化的,后来也适应了飞行和交配展示。脊椎动物的肢体最初是为了游泳而进化的,后来适应了行走、攀爬、飞行和操纵。在基因层面上,无论是在生物体内部还是跨物种,适应和多功能性都是常见的。例如,控制鸡腿和脚发育的 Hox 基因也控制着鸡翅膀的发育。此外,相同的基因负责(序列上仅有非常小的变化)所有四足动物的肢体发育 - 翅膀、爪子、脚掌、鳍状肢、尾鳍和手都具有相同的遗传起源(Davidson,2001:167-76;Gilbert,2000:503-21)。
Given the central role played in the EP methodology by the ‘one-to-one’ assumption, it is then a real problem for EP that this assumption embodies a serious misapprehension about how natural selection works. Far from “characteristically [operating] by ‘bolting on’ new special-purpose items to the existing repertoire” (Carruthers, 2004, p. 300), natural selection usually operates by jury-rigging what is already there to perform new tasks instead of (or in addition to) the old ones. Feathers originally evolved for thermal regulation, and subsequently were exapted for flight and mating displays as well. Vertebrate limbs originally evolved for swimming, and subsequently were fitted for walking, climbing, flying and manipulation. At the genetic level, too, exaptation and multifunctionality are common, both within organisms and across species. The Hox genes that control the development of a chicken’s legs and feet, for instance, also control development of its wings. Moreover the self-same genes are responsible (with only very minor changes in sequencing) for limb development in all tetrapods—wings, claws, paws, flippers, flukes and hands all have the same genetic origins (Davidson, 2001:167-76; Gilbert, 2000: 503-21).
扩展适应和多功能性也是心智和大脑的特征。如果给定机制 M1 执行某项任务 T1,并在执行过程中处理与某项其他任务 T2 相关的信息,那么 M1 很可能因为其在执行 T2 和 T1 中的作用而被选中。例如,物体识别过程可能会生成与深度感知相关的信息。如果是这样,这些过程可能会被招募用于两种功能,我们将使用单一机制实现两种功能。另一方面,直观的单一任务可能涉及多种机制随着时间的推移而拼凑在一起:T2 可能涉及 M1 和 M2 以及 M3。深度感知看起来是这样的:至少在解剖学和系统发育上部分不同的机制参与处理各种深度“线索”,例如双眼视差、遮挡、纹理梯度等。
Exaptation and multifunctionality are also features of the mind and brain. If a given mechanism M1 carries out some task, T1, and in so doing processes information that is relevant to some other task, T2, then M1 could well be selected because of its role in performing T2 in addition to T1. For example, the processes of object identification may generate information that is relevant to depth perception. If so, those processes may be recruited for both functions and we’d have two functions utilizing a single mechanism. On the other hand, what is intuitively a single task may involve multiple mechanisms cobbled together over time: T2 may involve M2 and M3 in addition to M1. Depth perception looks like this: mechanisms that are at least partly distinct, both anatomically and phylogenetically, are involved in the processing of the various depth ‘cues’ such as binocular disparity, occlusion, texture gradients, etc.
将旧材料重新用于新用途,以及由此产生的冗余和临时互联性,是自然选择的“修补”运作模式的特征。由于自然选择通常不会通过设计新的单一用途的设备来解决新的环境挑战,因此 EP 的一对一假设非常可疑。
The reuse of old materials for new purposes, with all the redundancy and ad hoc interconnectedness that it implies, is characteristic of selection’s ‘tinkering’ mode of operation. Because natural selection typically does not operate by designing new, single- purpose devices to solve new environmental challenges, EP’s one-to-one assumption is highly dubious.
一对一假设的另一个问题涉及 EP 对功能或任务的个体化。考虑对数量的检测。我们应该如何判断这是一个由单个模块服务的心理功能(如 Carruthers 假设,2004,第 300 页)还是由多个模块服务的多个功能?毕竟,对数量的检测实际上是一项高度复杂的任务。它涉及(例如)物体检测和个体化,这涉及(例如)深度和边缘感知,这涉及(例如)亮度和颜色边界的感知……等。检测数量是一项通过执行其他更简单的功能来执行的功能:功能是嵌套的。它们也是共享的。只是正如对数字的检测本身可以在更高级的功能中发挥作用(例如,在心理学实验中执行任务),刚刚讨论的所有较低级别的功能都在其他任务的执行中发挥作用:深度感知也有助于运动检测;颜色边界感知有助于深度感知;物体个体化有助于物体识别,等等。鉴于功能以这种方式嵌套和共享,很难看出进化心理学家——仅依靠一对一假设并避开§21.2.1 中描述的对神经和认知机制的详细研究——如何有任何原则性的理由说给定的功能(如对数字的感知)是由一个或多个模块执行的。面部识别、作弊者检测和 EP 理论关注的其他各种能力也是如此。
Another problem with the one-to-one assumption concerns EP’s individuation of functions or tasks. Consider the detection of numerosity. How should we decide whether this is one psychological function subserved by a single module (as Carruthers assumes, 2004, p. 300) or several functions subserved by several modules? The detection of numerosity, after all, is actually a highly complex task. It involves (e.g.) object detection and individuation, which involve (e.g.) depth and edge perception, which involve (e.g.) perception of luminance and color boundaries…etc. Detecting numerosity is a function carried out by the performance of other, simpler functions: functions are nested. They are also shared. Just as the detection of numerosity itself can play a role in higher-level functions (say, performing a task in a psychology experiment), all of the lower-level functions just discussed play roles in the performance of other tasks: depth perception also subserves motion detection; perception of color boundaries subserves depth perception; object individuation subserves object recognition, etc. Given that functions are both nested and shared in this manner, it is hard to see how evolutionary psychologists—relying only on the one-to-one assumption and eschewing the sorts of detailed investigations into neural and cognitive mechanisms described in §21.2.1—could have any principled reason for saying that a given function (like the perception of numerosity) is carried out by one module or many. Similarly for face-recognition, cheater detection, and the various other capacities that are the focus of EP theorizing.
§§21.2.1 和 21.2.2 中的观察显然破坏了 EP 的假设,即机制或模块和功能以 1:1 的方式整齐地对应,因此,模块的存在可以从对心智执行的任务的规范中推断出来。当然,人们可以将 EP 解读为只是规定“模块”的概念,使得每个功能都由一个且只有一个模块执行。但这种对 EP 结构假设的解读使它们变得微不足道。此外,这种对模块的“狭隘”解释与以下事实不一致:EP 中假设的模块几乎总是被假定具有其他属性,例如作为独立的选择目标、可独立破坏、信息封装等。(见 §21.4。)
The observations in §§21.2.1 and 21.2.2 clearly undermine EP’s assumptions that mechanisms or modules and functions correspond in a neat 1:1 manner and that as a result, the existence of modules can be inferred from a specification of the tasks the mind performs. Of course, one could read EP as simply stipulating a notion of ‘module’ such that each function is ipso facto performed by one and only one module. But such a reading of EP’s structural hypotheses trivializes them. In addition, this ‘thin’ interpretation of what a module is is inconsistent with the fact that the modules postulated in EP are virtually always assumed to have other properties, such as being independent targets of selection, being independently disruptable, being informationally encapsulated, and so on. (See §21.4.)
EP 方法论的另一个关键限制是它误解了学习和发展在塑造成熟心智方面的作用。这并不是说进化心理学家根本没有赋予学习和发展任何作用。而是他们认为这些过程是基因预先设定的。这种“预成论”图景不仅缺乏实证支持,而且还导致 EP 文献对需要适应性解释的内容存在重大错误。
Another crucial limitation of EP’s methodology is its misunderstanding of the role of learning and development in shaping the mature mind. It’s not that evolutionary psychologists assign no role at all to learning and development. It is rather that they think of these processes as strongly genetically pre-specified. Not only does this ‘preformationist’ picture have little empirical support, it engenders a crucial misspecification in the EP literature of what stands in need of adaptive explanation.
进化心理学家选取成熟人类所具有的某些行为或能力(例如,配偶偏好、欺骗检测或强奸欲望),然后着手对这些行为或能力背后的假定机制进行适应性解释(例如,参见 Thornhill 和 Thornhill,1987 年、1992 年关于强奸的论述;Wright,1994 年关于家庭关系的论述)。但如果学习在习得这些机制或行为中发挥重要作用,那么真正需要适应性解释的是这些机制发展背后的过程。
Evolutionary psychologists take some behavior or capacity possessed by mature humans—say, mate preferences, or cheater detection, or the desire to rape—and then proceed to give an adaptive explanation of the postulated mechanism underlying that behavior or capacity (cf., e.g., Thornhill and Thornhill, 1987, 1992 on rape; Wright, 1994 on family relations). But if learning plays an important role in the acquisition of these mechanisms or behaviors, then what really needs adaptive explanation is the processes underlying the development of those mechanisms.
无可否认,一些进化心理学家确实认为他们的任务涉及对发展的解释——参见 Carruthers 强调“进化的学习机制”产生了各种模块(2004 年,第 300、307 页)。然而,这里的假设似乎是,如果某些能力(以及其背后的模块 M)是自然选择建立的适应性,那么 (i) M 的展开是直接由基因预先指定的;或者 (ii) M 由“学习模块”L 产生,而 L 本身由基因构建,并严格限制为产生 M 作为其输出。从这个角度来看,L 和 M 之间的关系非常直接:就经验在 M 的发展中发挥的任何作用而言,它只是“触发”了 L 中的一系列效应,其结果受到基因的严格限制。
Admittedly, some evolutionary psychologists do see their task as involving the explanation of development—see Carruthers’ emphasis on “evolved learning mechanisms” as giving rise to various modules (2004, pp. 300, 307). However, the assumption here seems to be that if some competence (and the module, M, underlying it) are adaptations built by natural selection, then either (i) the unfolding of M is directly genetically pre-specified; or (ii) M is produced by a ‘learning module,’ L, which is itself built by the genes and tightly constrained to produce M as its output. On this view, the relationship between L and M is very direct: to the extent that experience plays any role at all in the development of M, it merely serves to ‘trigger’ a cascade of effects in L, the outcome of which is tightly genetically constrained.
然而,即使假设成人心智中特定的处理模块 M 确实是自然选择形成的适应性,这种推理也存在许多严重缺陷。首先,正如许多心理学家、生物学家和生物哲学家所强调的那样,适应性特征可能在环境中“编码”。(参见 Oyama,1985;Sterelny 和 Griffiths,1999;Sterelny,2003)。也就是说,自然选择可能没有将 M 构建到基因中(直接或间接通过学习机制 L),而是赋予我们构建环境E 的倾向,在这种环境中,M 会作为学习和/或其他发展机制的结果而产生,而这些机制并非由基因决定产生 M。例如,进化可能没有构建“民间心理学”模块,而是赋予我们创造各种社会和家庭环境的倾向,在这些环境中,儿童的普遍发展和学习能力使他们能够获得有关他人心智的知识。
However, there are a number of serious flaws in this reasoning, even assuming that a given processing module M in the adult mind is indeed an adaptation built by natural selection. First, as a number of psychologists, biologists and philosophers of biology have emphasized, adaptive traits may be ‘coded for’ in the environment. (Cf. Oyama, 1985; Sterelny and Griffiths, 1999; Sterelny, 2003). That is, instead of building M into the genes (either directly or indirectly via learning mechanism L), natural selection may have given us dispositions to construct an environment E in which M would arise as a result of learning and/or other developmental mechanisms which are not genetically determined to produce M. For example, rather than building in a ‘folk psychology’ module, evolution may have given us dispositions to create the kinds of social and familial environments in which children’s generalized developmental and learning abilities enable them to acquire knowledge of other minds.
这里的第二个问题涉及神经生物学和遗传学的现有证据与 EP 的假设之间的关系,即 M 或 L 等模块是“天生的或天生引导的”。(Carruthers,2004,第 304 页)。几位作家(例如,Bates 等人,1998 年)提出了一个简单的计数论证,反对众多认知模块(及其所有详细表示和复杂算法)是由基因指定的观点。人类大约有 30,000-70,000 个基因(Venter 等人,2001 年;Shouse,2002 年)。相比之下,大脑中估计有 10 14 个突触连接。因此,有人认为,基因数量少了好几个数量级,甚至无法编码或指定这些连接的一小部分。
A second problem here concerns the relation of current evidence from neurobiology and genetics to EP’s assumption that modules like M or L are “innate or innately channeled.” (Carruthers, 2004, p. 304). Several writers (e.g., Bates et al., 1998) have advanced a simple counting argument against the notion that numerous cognitive modules (with all their detailed representations and complex algorithms) are genetically specified. Human beings have approximately 30,000–70,000 genes (Venter et al., 2001; Shouse, 2002). By contrast, there are an estimated 1014 synaptic connections in the brain. Thus, it is argued, there are too few genes by many orders of magnitude to code for or specify even a small portion of these connections.
我们认为这一论点具有启发性,但并非决定性。调控基因和网络在控制结构基因表达方面的作用可能产生比 30,000 个基因的数字所暗示的更多的组合可能性。不过,计数论证确实引起了人们对进化心理学家需要解释认知模块如何由基因决定的需要的关注,这与已知的大脑发育情况一致。我们认为,这是一项不简单的任务,尤其是对于大脑皮层而言,众所周知,大脑皮层在 EP 理论中涉及的各种高级认知任务(如语言习得、作弊者检测、心智理论等)中发挥着核心作用。因为虽然皮层的总体结构特征似乎是由基因决定的,但有相当多的证据表明,皮层在其他方面最初相对未分化且等能。特别是,不同皮质区域发育的突触和树突连接模式(可能与 EP 模块包含的表征(句法、民间心理学等)相对应)受到感官输入的严重影响,而且进化心理学家的“触发”隐喻似乎无法捕捉到这种影响。事实上,皮质的许多区域都能够根据经验获得完全不同的表征。例如,正常视力受试者通常用于视觉处理的皮质区域在先天失明受试者中用于触觉任务,例如盲文阅读,而聋哑受试者的听觉皮质则用于处理手语(例如,Büchel 等人,1998 年;Nishimura 等人,1999 年)。这种“跨模态可塑性”现象使得我们很难理解大脑皮层如何能够包含专门用于特定认知或学习任务的先天表征,并破坏了 EP 的观点,即 M 或 L 等认知模块的发展是由基因驱动的。我们认为,在我们更多地了解构建 EP 所假设的心理模块所需的遗传和调节机制的实际运作方式之前,我们有权怀疑 EP 的发展故事——或者说,它缺乏这样的故事。
We find this argument suggestive but not decisive. The role of regulatory genes and networks in governing the expression of structural genes probably generates many more combinatorial possibilities than the figure of 30,000 genes suggests. Still, the counting argument does draw attention to the need for evolutionary psychologists to explain, consistently with what is known about brain development, how cognitive modules could be genetically specified. This, we think, is a non-trivial task, especially vis à vis the cerebral cortex, which is known to play a central role in the sorts of high-level cognitive tasks (like language acquisition, cheater detection, theory of mind, etc.) that figure in EP theorizing. For while the gross architectural features of the cortex do appear to be genetically specified, there is considerable evidence that the cortex is in other respects initially relatively undifferentiated and equipotent. In particular, the patterns of synaptic and dendritic connections that develop in different cortical areas—and presumably correspond to the representations (of syntax, folk psychology, etc.) which EP’s modules contain—are very heavily influenced by sensory inputs, and influenced in a way that the evolutionary psychologist’s ‘triggering’ metaphor seems ill equipped to capture. Indeed, many areas of cortex have the capacity to acquire fundamentally different sorts of representations depending on experience. For instance, the cortical areas normally devoted to visual processing in sighted subjects are used for tactile tasks, such as Braille reading, in congenitally blind subjects, and auditory cortex is recruited for processing sign language in deaf subjects (e.g., Büchel et al., 1998; Nishimura et al., 1999). This phenomenon of ‘cross-modal plasticity’ makes it very hard to see how the cortex could contain innate representations specialized for specific cognitive or learning tasks, and undermines EP’s notion that the development of cognitive modules like M or L is genetically driven. We think that until we hear more about the ways in which the genetic and regulatory mechanisms needed to build the mental modules postulated in EP actually work, we are entitled to view EP’s developmental story—or, really, its lack of such a story—with suspicion.
考虑到 EP 逆向工程策略的最后一个缺陷,即它忽视了许多心理特征可能根本无法用达尔文主义直接解释的事实,这种怀疑得到了进一步的加强。首先,虽然我们同意 Carruthers 的观点(2004 年,第 294 页),即整个思维不太可能是漂移或其他非选择性过程的产物,但某些心理机制可能是此类过程的结果,这绝非不可能。众所周知,发育、异速生长和生理化学因素都在神经功能和组织中发挥重要作用,而且很可能也会对某些心理特征产生影响。
Such suspicions are reinforced by consideration of a final shortcoming of EP’s reverse engineering strategy, namely, its blindness to the fact that many psychological traits may not be susceptible of direct Darwinian explanation at all. First, while we grant Carruthers’ point (2004, p. 294) that the entire mind is unlikely to be the product of drift or some other non-selective process, it’s by no means impossible that particular psychological mechanisms might be the results of such processes. Developmental, allometric and physio-chemical factors are all known to play significant roles in neural functioning and organization, and may well turn out to be responsible for some psychological traits as well.
或者,一些心理机制可能是 Gould 和 Lewontin (1979) 所指的“拱肩”。也就是说,它们可能是为其他目的而进化的特征的幸运副产品。例如,有证据表明,我们将连续的声学信号组织成语言相关片段(音素)的能力是哺乳动物大脑恰好进化到处理听觉信息的方式的副产品。当然,碰巧有利的副产品本身可能会受到正向选择压力的影响——它们可能成为“二次适应”。但心理机制是拱肩或仅仅是二次适应的可能性以不同的方式破坏了 EP 的假设,即每种心理机制都是按顺序构建的,以解决不同的适应性问题。拱肩的可能性使 EP 的假设受到质疑,即模块是适应性问题的最佳或近乎最佳的解决方案:乌龟的鳍可能被优化以推动沉重的身体穿过水,但它们远非筑巢时穿越沙子的最佳方式。某些心理机制可能是扩展适应,这一可能性进一步削弱了 EP 的一对一假设,该假设在 §21.2.2 中进行了讨论:复杂的扩展适应(例如,可以说,人类的语言能力或作弊者检测能力)通常由多种机制拼凑而成,这些机制是为其他目的而设计的(并且仍在使用)。虽然人们当然可以将这种复杂的二次适应称为单一机制或模块,但尚不清楚它们是否可以归因于通常归因于模块的其他特征,例如信息封装或独立可破坏性。(见下文,§§21.4.3、21.4.4。)
Alternatively, some psychological mechanisms might be ‘spandrels’ in the sense of Gould and Lewontin (1979). That is, they might be lucky byproducts of traits that evolved for other purposes. There’s evidence, for instance, that our capacity to organize continuous acoustical signals into linguistically-relevant segments (phonemes) is a byproduct of the way that mammalian brains happen to have evolved to process auditory information. Of course, byproducts that happen to be advantageous may themselves be subject to positive selection pressure—they may become ‘secondary adaptations.’ But the possibilities that psychological mechanisms are spandrels or mere secondary adaptations undermine, in different ways, EP’s assumption that each psychological mechanism is built to order to solve a distinct adaptive problem. The spandrels possibility puts into doubt EP’s assumption that modules are optimal or near-optimal solutions to adaptive problems: a turtle’s fins may be optimized for propelling a heavy body through water, but they are far from optimal means of crossing the sand at nesting time. And the possibility that some mental mechanisms are exaptations further undermines EP’s one-to-one assumption, discussed in §21.2.2: complex exaptations (like, arguably, the human capacity for language or cheater detection) are often cobbled together from multiple mechanisms that are designed (and still used) for other purposes. While one can certainly call such complex secondary adaptations single mechanisms or modules, it’s unclear that they can be attributed the other features commonly ascribed to modules, such as informational encapsulation or independent disruptability. (See below, §§21.4.3, 21.4.4.)
EP 不仅声称心灵包含各种心理模块,而且声称心灵是一个模块系统。在本节中,我们将研究这一主张的论据。(为了论证的目的,我们在此假设“模块”的概念相对清晰。这一假设将在 §21.4 中受到批评。)
EP claims not just that the mind contains various mental modules, but that it is a system of modules. In this section, we examine the arguments for this claim. (We assume here, for the sake of argument, that the notion of a ‘module’ is relatively clear. This assumption will be criticized in §21.4.)
心智是模块系统这一说法的主要论据最初来自 John Tooby 和 Leda Cosmides。他们声称,特定领域的模块必然会被选中,因为相对独立于内容(或通用)的架构在原则上不是可行的选择对象(例如,Cosmides 和 Tooby,1992b、1994;Tooby 和 Cosmides,1990、1992;另见 Samuels,1998,其中有力地阐述了 EP 的“大规模模块化”假设)。这一说法有两个论据。首先,通用学习机制面临“框架问题”。除非事先确定与问题相关的因素,否则通用推理机制将面临大规模的组合爆炸——它们的主人在能够繁殖之前就会被吃掉。 (见 §21.3.1。)其次,乔姆斯基关于语言学习模块存在的刺激匮乏论证被推广为表明通用推理在任何学习问题面前都是无效的。首先,与可用数据兼容的假设总是比学习者可以有效测试的假设多。另一方面,测试本身就是有问题的。没有领域中立的成功标准:评估觅食策略涉及的措施与用于测试作弊者假设的措施不同。更糟糕的是,有些假设和策略是个人根本无法评估的——配偶选择策略就是一个例子,当然,假设这里的适当措施是包容性适应度。(见 §21.3.2)结果是,仅配备通用推理或学习机制的类人猿无法在欧洲经济区生存。显然需要对学习机制进行额外的约束,而这些正是模块化架构所提供的。
The main argument for the claim that the mind is a system of modules is originally due to John Tooby and Leda Cosmides. They claim that domain-specific modules would inevitably be selected for because relatively content-independent (or general-purpose) architectures are in principle not viable objects of selection (e.g., Cosmides and Tooby, 1992b, 1994; Tooby and Cosmides, 1990, 1992; see also Samuels, 1998 for a forceful statement of EP’s ‘massive modularity’ hypothesis). There are two arguments given for this claim. First, general learning mechanisms face the ‘Frame Problem.’ Unless the factors relevant to a problem are delineated in advance, general- purpose inference mechanisms face a massive combinatorial explosion—and their owners get eaten before they can reproduce. (See §21.3.1.) Secondly, Chomsky’s poverty of the stimulus argument for the existence of a language-learning module is generalized to show that general-purpose inference is ineffective in the face of any learning problem. For one thing, there will always be more hypotheses compatible with the available data than the learner can effectively test. For another thing, testing is itself problematic. There are no domain-neutral criteria for success: evaluating foraging strategies involves different measures from those used to test hypotheses about cheaters. Worse, there are some hypotheses and strategies that an individual cannot evaluate at all—mate selection strategies would be an example, assuming, of course, that the appropriate measure here is inclusive fitness. (See §21.3.2) The upshot is that hominids equipped only with general-purpose inference or learning mechanisms wouldn’t have survived in the EEA. Additional constraints on learning mechanisms are clearly needed, and those are what modular architectures supply.
Fodor (1983;另见本卷第 9 章) 认为,许多或大多数认知(或“中央”)过程都是非模块化的,因为推理、深思熟虑和规划等必须潜在地访问代理所知道的一切。他认识到,这意味着这种非模块化过程会受到所谓的“框架问题”的影响——即指定哪些信息与哪个问题相关的问题——因此,他推测它们将无法进行认知科学研究。进化心理学家的悲观情绪甚至比 Fodor 更深:他们认为框架问题不仅是理论化中央处理器的障碍,而且是其存在的障碍!例如,Carruthers (2004, p.303) 认为“任何处理器如果必须访问代理的全部背景信念……都将面临难以控制的组合爆炸”,并因此得出结论:“心智……由一组处理系统组成,这些系统……独立于心智其他地方可用的大部分信息运行。”因此,EP 通过假设决策和行为背后的过程是模块化的,从而(反)解决了框架问题:它们既不需要也不需要访问代理的大部分信念和愿望。
Fodor (1983; see also Chapter 9, this volume) maintained that many or most cognitive (or ‘central’) processes are non-modular, since reasoning, deliberation and planning etc. must potentially have access to everything an agent knows. He recognized that this meant that such non-modular processes are subject to the so-called ‘frame problem’—the problem of specifying what information is relevant to which problem—and for this reason, speculated that they would prove unamenable to cognitive-scientific investigation. The pessimism of evolutionary psychologists is deeper even than Fodor’s: they view the frame problem not just as an obstacle to theorizing about central processors, but rather to their very existence! Carruthers (2004, p. 303), for instance, argues that “any processor which had to access the full set of the agent’s background beliefs…would be faced with an unmanageable combinatorial explosion” and hence concludes that “the mind…consist[s] of a set of processing systems which…operate in isolation from most of the information which is available elsewhere in the mind.” EP thus (dis)solves the frame problem by assuming that the processes underlying decision-making and behavior are modular: they neither have nor need access to the bulk of the agent’s beliefs and desires.
这是否是框架问题的令人满意的解决方案取决于人们如何看待这个问题。如果人类的推理、审议和计划过程可以在不了解大量主体的信念和愿望的情况下产生令人满意的决策和行为,那么这确实是支持模块化图景的一个重要观点。然而,在许多情况下,显然,如果没有这样的了解,推理等就无法产生哪怕是最低限度令人满意的决策和行为——例如,考虑一下影响与同类合作决策的一系列因素。然而,如果是这样,那么 EP 声称已经解决了框架问题的说法就站不住脚了,模块化主义者必须面对这样一个问题:我们的推理、审议和计划过程如何能够了解如此多且多种多样的背景信念和愿望。大概,进化心理学家在这里不能调用一个单一的、硬连线的“决策模块”,因为自然选择显然无法预测我们一生中可能面临的所有决策;此外,与这些决策相关的信念和愿望会随环境而变化,因此无法预先指定。假设有人建议一组封装的模块协作规划和执行复杂的动作。在这种情况下,我们必须问它们的操作是如何协调的。似乎有两种选择。一种是存在固定的模块层次结构,每个模块都将其输出发送到层次结构中的下一个模块,依此类推,直到输出行为命令。或者,存在某种“模块集成模块”(Samuels 等人,1999 年毫不讽刺地称之为“资源分配模块”),它获取各种低级模块的输出,对其进行评估,并发出相同的行为指令——Carruthers(2004 年,第 15.6 节)提出“现有模块……自然语言能力”(第 307 页)执行这项整合任务。
Whether this is a satisfactory solution to the frame problem depends on what one takes that problem to be. If human reasoning, deliberation and planning processes can generate satisfactory decisions and behavior without access to large numbers of the agent’s beliefs and desires, then this will indeed be an important point in favor of the modularist picture. However, it seems plain that in many cases, reasoning etc. cannot issue in even minimally satisfactory decisions and behaviors without such access—consider, for instance, the range of factors bearing on a decision to cooperate with a conspecific. If this is so, however, then EP’s claim to have solved the frame problem is undermined, and the modularist must confront the question of how our processes of reasoning, deliberation and planning could have access to so many and so varied of our background beliefs and desires. Presumably, evolutionary psychologists cannot invoke a single, hardwired ‘Decision Making Module’ here, for natural selection clearly cannot anticipate all the decisions we potentially face in a lifetime; moreover, the beliefs and desires that are relevant to these decisions vary with context and hence cannot be prespecified. Suppose that it is instead suggested that a group of encapsulated modules collaborate in the planning and execution of complicated actions. In that case, we must ask how their operations are coordinated. There seem to be two options. One is that there is a fixed hierarchy of modules, such that each module sends its outputs to the next one up in the hierarchy, and so on, until a behavioral command is outputted. Alternatively, there is some kind of ‘Module Integration Module’ (what Samuels et al., 1999 unironically call a “Resource Allocation Module”) which takes the outputs of various lower-level modules, evaluates them, and issues in the same behavioral instruction—Carruthers (2004, section 15.6) proposes that “an existing module…the natural-language faculty” (p. 307) performs this integrative task.
但这两种选择都不合理。一个进化的、硬连线的模块层次结构容易受到与决策模块相同的反对意见:我们的行为太复杂了,导致这些行为的心理过程也太多样化了,以至于框架问题无法通过预先指定的层次结构来解决。这给我们留下了一个模块集成模块的想法,它接收所有其他模块的传递,这些模块的计算可能与给定问题相关,并决定如何处理它们。但是,一个“模块”可以 (i) 评估在给定上下文中哪些模块输出是重要的,(ii) 决定什么结果是可取的,然后 (iii) 找出哪些行为(以及以何种顺序)会导致该结果,这根本不是一个模块(在 EP 意义上)!相反,它在功能上等同于 Fodor 的中央处理器,并且假设框架问题和组合问题是真实问题,它会再次提出这些问题。只要我们仔细研究一下大规模模块化思维是如何运作的,就会发现框架问题并不是支持大规模模块化思维理论的论据,相反,它是反对该论点的论据!1
But neither of these alternatives is plausible. An evolved, hard-wired hierarchy of modules is vulnerable to the same objections as the Decision Making Module: our behaviors are simply too complex, and the mental processes giving rise to them too varied, for the frame problem to be solved by a pre-specified hierarchy. This leaves us with the idea of a Module Integration Module, which takes in the deliverances of all the other modules whose computations are potentially relevant to a given problem and decides what to do with them. But a ‘module’ that can (i) assess which of the plethora of modular outputs are important in a given context and (ii) decide what outcome is desirable and then (iii) figure out which behaviors (and in what order) will result in that outcome isn’t a module (in the EP sense) at all! Instead, it’s functionally equivalent to Fodor’s Central Processor, and, assuming that the frame problem and combinatorial problems are real problems, it raises them all over again. As soon as one looks in detail at how a massively modular mind is supposed to work, one sees that the frame problem is not an argument for the theory that the mind is massively modular; instead, it’s an argument against that thesis!1
假设刺激贫乏论证让我们相信,人们获得的某些假设或技能不可能仅从现有证据中习得。这表明,成功学习需要证据中没有的额外约束。进化心理学家(像其他刺激贫乏论证的支持者一样)很快就假设,所讨论的约束必须是 (i) 表征性的、(ii) 认知复杂的和 (iii) 特定于各种常识领域或主题的。因此,例如,我们被告知必要的约束是各种“理论”(例如,普遍语法、心智理论)。而且由于这些理论的内容远远超出了现有数据的范围,这种观点反过来表明 (iv) 学习所需的约束体现在天生指定的模块中(语言习得装置、心智理论模块等)。
Suppose that a poverty of the stimulus argument has convinced us that some hypothesis or skill which people acquire could not have been learned just from the evidence available. This shows us that additional constraints, not present in the evidence, are required for successful learning. Evolutionary psychologists (like other proponents of poverty of the stimulus arguments) are quick to assume that the constraints in question must be (i) representational, (ii) cognitively sophisticated and (iii) specific to various common-sense domains or subject matters. Thus, for instance, we are told that the necessary constraints are ‘theories’ of various sorts (e.g., universal grammar, theory of mind). And because the content of these theories so far outruns the available data, this view suggests in turn (iv) that the needed constraints on learning are embodied in innately-specified modules (Language Acquisition Devices, Theory of Mind modules, etc.).
然而,这种图景本身超出了刺激贫乏论证的范畴。因为该论证仅表明需要某些约束,而没有表明这些约束是什么类型。因此,除了 EP 中假设的复杂表征约束之外(或代替),其他类型的约束可能有助于学习。例如,可能存在各种感知偏见,或将我们的注意力引向某些刺激的倾向,或有关我们的奖励结构的事实,这些事实鼓励某些行为而不是其他行为。例如,有证据表明,皮层下机制优先将婴儿的视觉注意力引向符合大致类似面部模板的物体,并且奖励机制会释放化学物质,使婴儿在关注此类刺激时感觉良好(Johnson,1997)。这些机制本身无法产生成人面部识别行为的全部范围。然而,它们确实有助于减少儿童面临的不确定性问题(为什么关注面部而不是肘部?为什么关注眼睛而不是下巴?),并且它们产生的优先观察和注意可能会导致逐渐构建像“面部识别模块”一样的皮质回路。
However, this picture itself outruns what is warranted by the poverty of the stimulus argument. For that argument indicates only that some constraints are needed, not what kinds of constraints those are. Thus, learning may be subserved by other types of constraint in addition to (or instead of) the sophisticated representational constraints postulated in EP. There might, for instance, be perceptual biases of various sorts, or dispositions to direct our attention to certain kinds of stimuli, or facts about our reward structures that encourage certain sorts of behavior rather than others. For example, there is evidence that subcortical mechanisms preferentially direct infants’ visual attention to objects that fit a loosely face-like template, and that reward mechanisms release chemicals that make infants feel good when attending to such stimuli (Johnson, 1997). By themselves, these mechanisms are incapable of generating the full range of adult face-recognition behavior. However, they do help in reducing the underdetermination problem faced by the child (why focus on faces rather than elbows? why focus on eyes rather than chins?), and the preferential looking and attending that they produce may result in the gradual construction of cortical circuits that behave like a ‘face recognition module.’
其他可能的限制因素是发育或结构上的。控制发育不同方面时间的时空因素可以通过指导各种学习任务的排序来减少不确定性:例如,如果你已经有了语言音素的表征,那么学习语言的语法就会更容易。此外,尽管在皮质中发育的突触连接的具体模式取决于经验,但皮质的总体结构(例如,不同区域的特征层状结构和基本电路类型)很可能是由基因决定的(参见§21.2.3)。这些结构特征本身并不等同于先天的表征或模块,但它们可能通过偏向某些区域承担某些任务而不是其他任务,或鼓励某些类型的表征而不是其他类型的表征在响应感官输入时发展,来帮助大脑解决学习问题。正如这些例子所示,像进化心理学家经常做的那样,假设仅有的两种可能性是错误的,要么是完全不受约束的通用学习者,要么是预先配备大量领域特定知识的高度模块化学习者。
Other possible constraints are developmental or architectural. Chronotopic factors governing the timing of different aspects of development can reduce underdetermination by guiding the sequencing of various learning tasks: learning to the grammar of a language is easier if you already have a representation of its phonemes, for example. In addition, although the detailed pattern of synaptic connections that develops in the cortex is experience-dependent, the gross architecture of the cortex (e.g., different areas’ characteristic laminar structures and basic circuitry types) may well be genetically specified (cf. §21.2.3). These architectural features do not themselves amount to innate representations or modules, yet they may help the brain to solve learning problems by biasing certain areas to assume some tasks rather than others, or encouraging certain sorts of representations rather than others to develop in response to sensory input. As these examples show, it is a mistake to suppose, as evolutionary psychologists frequently do, that the only two possibilities are either a completely unconstrained, general purpose learner or a heavily modular learner pre-equipped with large bodies of domain specific knowledge.
最后,关于 EP 的主张,即心智是一个“模块系统”,值得一提的是。本节讨论的两个论点都是模块化假说的一个非常强的版本,即心智只包含模块。如前所述,我们认为这种“大规模模块化”心智观点的证据根本不令人信服。但是,还有一种更“温和”的模块化假说,即心智包含一些模块。(例如,Fodor(1983)的模块化假说是温和的:它假设了模块化感官机制和非模块化中央处理机制。温和还包含了一些认知(而不是感官)处理也是模块化的可能性。)到目前为止,我们的讨论还没有确定某种温和的模块化论题是否正确。然而,在下一节中,我们将论证 EP 中部署的模块概念从根本上是不明确的。因此,虽然心智确实可能包含一些“模块”(在某种意义上),但我们将在§21.4 中看到,即使是适度模块化的倡导者也需要明确他们的论点是什么。
One final point deserves to be made about EP’s claim that the mind is a ‘system of modules.’ Both of the arguments discussed in this section are arguments for a very strong version of the modularity hypothesis, namely, that the mind contains nothing but modules. As already indicated, we don’t think that the evidence for this ‘massively modular’ view of the mind is at all compelling. However, there is also a more ‘Modest’ modularity hypothesis to the effect that the mind contains some modules. (E.g., Fodor’s (1983) modularity hypothesis was Modest: it postulated both modular sensory mechanisms and non-modular central processing mechanisms. Modesty also embraces the possibility that some cognitive (as opposed to sensory) processing is modular too.) Our discussion so far leaves it open that some kind of Modest modularity thesis is correct. In the next section, however, we argue that the notion of a module, as deployed in EP, is fundamentally unclear. Thus, while the mind may indeed contain some ‘modules’ (in some sense of that word), we will see in §21.4 that even advocates of Modest modularity need to clarify considerably what their thesis amounts to.
现在我们来讨论一下模块是什么。我们认为,EP 文献中用于模块化的各种不同标准远非共存,因此导致了“模块”的不同概念。我们还强调,这些不同的模块化主张需要(但通常得不到)不同类型的支持证据。我们得出的结论是,EP 普遍未能认识到这些观点,这既削弱了其对思维模块化的论证,也破坏了它所假设的特定认知模块的地位。
We turn now to the question of what modules are. We argue that the various different criteria used for modularity in the EP literature are far from co-extensive and thus lead to quite different notions of a ‘module.’ We also emphasize that these different modularity claims require (but often do not get) different sorts of supporting evidence. We conclude that EP’s widespread failure to recognize these points both weakens its case for the modularity of mind and undermines the status of the specific cognitive modules it postulates.
正如 Carruthers 所指出的(2004 年,第 15.1.1 节),以及我们将在本节中感叹,EP 中“模块”一词的含义具有高度灵活性。然而,前面几节已经预示了 EP 模块化概念的一个负面点:它与神经学家的神经特异性概念关系不大。首先,这种观点认为不同的大脑区域(相对)专门负责不同的任务。例如,在大多数人中,左半球在语言处理中占主导地位 - 例如,词汇产生或多或少集中在布罗卡区、韦尼克区和左丘脑(Indefrey 和 Levelt,2000 年,第 854 页)。其次,神经特异性的概念包含这样一个事实,即不同大脑区域和不同任务所使用的表征和计算可能非常不同。例如,对物体颜色的感知涉及三种视网膜视锥细胞对其光谱特性的表示,并进行调整以补偿环境光的特性(Wandell,2000),而对声音的感知则涉及对低级声学特征的表示,如开始时间、音高和位置,然后计算音色等高阶特性,最终得到对语音、音乐或其他类型噪声的表示(Shamma,2000)。
As Carruthers notes (2004, section 15.1.1) and as we will be lamenting in this section, the meaning of term ‘module’ in EP is highly elastic. However, one negative point about EP’s notion of modularity has been foreshadowed in previous sections: it bears little relation to the neuroscientist’s notion of neural specificity. This is the idea, first, that different brain regions are (relatively) specialized to different tasks. In most people, for instance, the left hemisphere is dominant in language processing—word production, e.g., is more or less localized to Broca’s area, Wernicke’s area and the left thalamus (Indefrey and Levelt, 2000, p. 854). Secondly, the idea of neural specificity embraces the fact that the representations and computations that are used in different brain regions and for different tasks may be quite diverse. For example, the perception of an object’s color involves the representation of its spectral properties by the three retinal cone types, adjusted so as to compensate for properties of the ambient light (Wandell, 2000), whereas perception of sounds involves the representation of low-level acoustical features as onset time, pitch and location, followed by the computation of higher-order properties such as timbre, resulting ultimately in the representation of items of speech, music, or other types of noise (Shamma, 2000).
现在,如果 EP 声称感知和认知处理和机制是“领域特定”或“模块化”的,其全部含义是,这些过程和机制是神经局部化的,涉及不同类型的表示上的不同类型的计算,我们会欣然同意。例如,即使是最狂热的反模块化主义者也不会怀疑视网膜锥体在提取声学信息方面是无效的。然而,正如我们已经提到的,EP 的追随者通常有比这更强烈的想法。
Now, if all that were meant by EP’s claims that perceptual and cognitive processing and mechanisms are ‘domain specific’ or ‘modular’ were that such processes and mechanisms are neurally localized and involve different kinds of computations over different kinds of representations, we would readily agree. Not even the most rabid anti-modularist doubts, for example, that retinal cones are ineffective at extracting acoustical information. However, as we have already suggested, adherents of EP generally have something much stronger in mind than this.
作为这方面的证据,首先考虑这样一个事实:上述神经特化通常是相对的,而不是绝对的。某个区域中的细胞可能对某些类型的输入反应特别强烈,或者在执行某项任务时特别活跃。但正如神经成像数据越来越清楚地表明的那样,它们通常也会对许多其他输入和任务要求做出反应,尽管不那么激烈。例如,Andersen 等人 (2000) 提供的证据表明,后顶叶皮层(传统上被认为是专门负责注意力和空间意识的)也参与了目标导向行为的规划。同样,DeAngelis 等人 (2000) 认为,通常被认为高度专门用于运动检测的皮质区 MT 也与立体深度感知有关。
As evidence for this, consider first the fact that the neural specialization described above is typically relative, rather than absolute. Cells in a certain area may respond especially strongly to certain kinds of inputs or may be particularly active in the execution of a certain task. But as neural imaging data are increasingly making clear, they will typically also respond, though less vigorously, to many other inputs and task demands. Andersen et al. (2000), for example, give evidence that the posterior parietal cortex, classically thought to be specialized for attention and spatial awareness, is also involved in the planning of goal-directed behavior. Similarly, DeAngelis et al. (2000) argue that cortical area MT, normally held to be highly specialized for motion detection, is also implicated in the perception of stereoscopic depth.
正如相同的大脑区域可能执行不同的任务一样,许多常识上认为是单一的任务可能涉及激活许多不同的大脑区域。例如,面部识别不仅涉及损伤和分离研究中提到的梭状回区域,还涉及海马旁回、海马、颞上沟、杏仁核和岛叶(McCarthy,2000 年)。同样,动词的产生涉及左额叶皮层、前扣带回、后颞叶和右小脑的区域(Posner and Raichle,1994 年,第 120 页)。因此,在神经层面上,面部识别或说出口语等任务仅在非常微弱和与任务相关的意义上由“单一机制”执行。
Just as the same brain areas may subserve different tasks, many tasks that common sense might count as unitary can involve activation of numerous different brain regions. Face recognition, for example, involves not only the areas in the fusiform gyrus that are cited in lesion and dissociation studies, but also the parahippocampal gyrus, the hippocampus, the superior temporal sulcus, the amygdala and the insula (McCarthy, 2000). Likewise the production of verbs involves areas in the left frontal cortex, anterior cingulate, posterior temporal lobe and right cerebellum (Posner and Raichle, 1994, p. 120). At the neural level, then, tasks like recognizing a face or producing a spoken word are performed by a ‘single mechanism’ only in a very attenuated and task-relative sense.
这种由相同神经区域分担任务以及将任务分布到众多不同区域的做法与 EP 所说的专门用于不同认知和感知任务的不同模块形成了鲜明对比。因此,进化心理学家关于“领域特定”或“专用”模块的说法不应与刚刚描述的神经特异性事实相混淆。但如果是这样的话,EP 所说的“专用”或“领域特定”处理又意味着什么呢?
This sharing of tasks by the same neural areas and the distribution of tasks over numerous different areas contrasts strongly with EP’s talk of distinct modules devoted to distinct cognitive and perceptual tasks. Hence, evolutionary psychologists’ claims about ‘domain specific’ or ‘dedicated’ modules should not be confused with the facts about neural specificity just described. But if that’s the case, what does EP’s talk of ‘dedicated’ or ‘domain specific’ processing amount to?
进化心理学家回答说,需要区分 Marr (1982) 所说的“实施层面”细节和“计算层面”理论(参见 Griffiths,2001)。由于他们的理论处于心理或计算层面,我们不应该期望他们假设的模块能够反映在神经组织的细节中。正如 Cosmides 和 Tooby 所说,EP“与认知层面的解释关系比与任何其他近因层面的关系更密切。”(1987,第 284 页)
Evolutionary psychologists answer that one needs to distinguish between what Marr (1982) called ‘implementational level’ details on the one hand, and theories at the ‘computational level’ on the other (cf. Griffiths, 2001). Since their theories are at the psychological or computational level, we should not expect the modules they postulate to be reflected in the nitty gritty of neural organization. As Cosmides and Tooby put it, EP “is more closely allied with the cognitive level of explanation than with any other level of proximate causation.” (1987, p. 284)
但是,尽管在认知科学中,总是倾向于在某一描述层面上进行理论化,而忽略其他层面上的约束,但我们认为这是错误的。任何心理学家都不应忽视神经科学,因为心理学理论必须在大脑中实现,而且,正如越来越明显的是,这种约束非常强烈。进化心理学家忽视大脑如何执行心理任务的事实是双重错误。首先,正如 §21.2.1 所明确指出的,心理功能的个体化必须受到实施信息的约束。其次,正如 §21.3.1 所敦促的,如果忽视实施问题,就无法有效地理论化自然选择如何作用于心智和大脑。第三,心理/实施方面的严重分歧削弱了 EP 的核心证据来源之一:如果 EP 的模块与大脑无关,就很难看出 EP 文献中经常引用的神经科学数据(关于定位、分离等)的相关性(参见 Carruthers,2004,第 15.4.2 和 15.4.3 节;参见 Pinker,1999)。但最重要的是,忽视实施约束可能会削弱 EP 的模块化和任务特异性概念,使其脱离任何实际内容。如果模块的概念与神经特异性的主张无关,那么它意味着什么?接下来,我们将回顾归因于模块的几个特征,并研究它们的相互关系。
But while the urge to theorize at one level of description while ignoring constraints from other levels is endemic to cognitive science, we think that it is a mistake. No psychologist should ignore the neurosciences because psychological theories must be implementable in brains and, as is increasingly becoming apparent, this constraint is an extremely strong one. It is doubly a mistake for evolutionary psychologists to neglect facts about how psychological tasks are performed by the brain. First, as §21.2.1 made clear, the individuation of psychological functions must be constrained by implementational information. Second, as §21.3.1 urged, one cannot usefully theorize about how natural selection operates on the mind and brain while neglecting implementational issues. Third, a sharp psychological/implementational divide undermines one of EP’s central sources of evidence: if EP’s modules have nothing to do with the brain, it is hard to see the relevance of the sorts of neuroscientific data (about localization, dissociations, etc.) that are frequently cited in the EP literature (cf. Carruthers, 2004, sections 15.4.2 and 15.4.3; and see Pinker, 1999). Most importantly, though, neglecting implementational constraints threatens to leach EP’s notions of modularity and task-specificity of any real content. If the notion of a module is not tied to claims about neural specificity, what does it amount to? In what follows, we review several features that have been ascribed to modules and examine their interrelations.
EP 模块通常具有可分离性或独立可破坏性,其理念是,如果两个模块不同,那么应该可以(至少在原则上)干扰每个模块的运行而不影响其他模块的运行。2正如我们已经观察到的(§21.2.1),EP 缺乏对模块的内在表征,无法直接确定一个独立识别的机制是否与另一个机制分离。相反,模块是根据它们假定执行的任务进行功能表征的,而实际观察到的分离是任务之间的分离(例如,单词生成和语法句子理解之间的分离)。这些任务之间的分离被视为独立模块存在的证据。因此,Carruthers(2004,第 15.4.2 节)和 Pinker(1994,49ff.)认为,特定语言障碍和威廉姆斯综合症患者身上表现出的一般认知任务与语言生成和理解任务之间的双重分离,有力地证明了语言背后存在一个任务特定的心理模块。
One feature that is often ascribed to modules in EP is dissociability or independent disruptability, the idea being that if two modules are distinct, then it should be possible (at least in principle) to interfere with the operation of each one without affecting the operation of other.2 As we have already observed (§21.2.1), EP lacks an intrinsic characterization of modules that would allow one to determine directly whether one independently-identified mechanism has dissociated from another. Instead, modules are characterized functionally, in terms of the tasks that they are assumed to perform, and the dissociations that are actually observed are dissociations between tasks (e.g., between production of words and comprehension of grammatical sentences). It is these dissociations among tasks that are taken to be evidence for the existence of independent modules. Thus, Carruthers (2004, section 15.4.2) and Pinker (1994, 49ff.), for instance, argue that the double dissociation between general cognitive tasks and language production and comprehension tasks seen in subjects with Specific Language Impairment and Williams syndrome is strong evidence that there is a task-specific mental module underlying language.
虽然分离的证据意义是一个复杂的主题,我们无法在此公正地对待,但这种推论远比人们通常认识到的要困难得多。3首先,有许多直观的非模块化架构可能导致任务之间的双重分离(参见 Shallice,1988,245ff)。其次,区分由发育障碍引起的分离和由成人大脑受伤(或实验性操作)引起的分离至关重要。前者与学习或发展机制有关,后者与成熟的心理能力有关。第三,从能力的双重分离推断模块的不同性通常需要额外的经验假设,例如(i)“普遍性”假设,即正常和异常受试者都共享认知架构(不包括异常者的受损模块);(ii)“减法”假设,即脑损伤只会移除模块或它们之间的连接,并不会引起任何显著的神经重组; (iii) 关于是否破坏与某项任务有关的模块之间的一个或所有连接对于任务的中断是必要的,各种“门控”假设。(参见 Shallice,1988,218ff.;Glymour,2001,第 135-6、143-4 页。)
While the evidential significance of dissociations is a complicated subject to which we cannot do justice here, such inferences are far more problematic than is generally appreciated.3 First, there are a number of intuitively non-modular architectures that can give rise to double dissociations among tasks (cf. Shallice, 1988, 245ff.). Second, it is crucial to distinguish between dissociations arising from developmental disorders and dissociations resulting from injuries to (or experimental manipulations of) adult brains. The former bear on mechanisms of learning or development, and the latter on mature psychological competences. Third, inferences from a double dissociation of capacities to the distinctness of modules generally require additional empirical assumptions, such as (i) a ‘universality’ assumption to the effect that both normal and abnormal subjects share a cognitive architecture (excluding the damaged modules in abnormals); (ii) a ‘subtraction’ assumption to the effect that brain damage only removes modules or the connections between them and it does not engender any significant neural reorganization; and (iii) various ‘gating’ assumptions about whether the destruction of one or all connections between modules involved in a task is necessary for disruption of the task. (Cf. Shallice, 1988, 218ff.; Glymour, 2001, pp. 135-6, 143-4.)
这些假设在经验上是值得怀疑的,尤其是当所讨论的分离是发育或遗传起源时。首先,具有遗传异常(或儿童脑损伤)的受试者可能在许多方面与正常受试者不同。其次,众所周知,儿童早期出现的无能会引发补偿性心理策略和实质性的神经重组。
These assumptions are empirically questionable, especially when the dissociations in question are developmental or genetic in origin. First, subjects with genetic abnormalities (or childhood brain injuries) are likely to differ from normal subjects in many different ways. Secondly, incapacities appearing early in childhood are known to call forth compensatory psychological strategies and substantial neural reorganization.
因此,与 Carruthers 和 Pinker 所暗示的相反,患有特定语言障碍的受试者与正常受试者的区别仅在于语言功能受损,这种可能性极小。相反,正如许多实证研究证明的那样,此类受试者还存在许多其他认知和感知缺陷。4因此,语言与一般认知能力之间分离的明确性被削弱了——EP 从这种分离推断出这些能力背后存在不同的模块,这种推断也是如此。
Hence, and contrary to what Carruthers and Pinker imply, it is extremely unlikely that subjects with Specific Language Impairment differ from normal subjects only in having impaired language function. Instead, as many empirical studies attest, such subjects have numerous other cognitive and perceptual deficits as well.4 Thus, the cleanness of the dissociation between language and general cognitive abilities is undercut—as is EP’s inference from that dissociation to the existence of distinct modules underlying those abilities.
我们在本节的最后再次承认,模块的概念非常“单薄”,即在给定某些其他假设的情况下,双重分离必然导致模块化(在这种意义上)。例如,如果我们简单地假设每个不同的能力都基于一个不同的模块(参见 §21.2.2 中讨论的“一对一”假设),所有正常主体都共享相同的架构,并且如果我们将任何两个人的能力分离视为表明这些能力是不同的(在所有主体中),那么我们就可以从分离性到模块的不同性得出一个毫无问题的推论。然而,这个漂亮的推论是以一个不太有趣的“模块”概念为代价的。一旦我们开始为模块赋予其他“更厚”的属性(如信息封装或独立可进化性(§§21.4.3-21.4.4)),推论就会变得没那么令人信服,因为这些属性不一定适用于可分离性标准所区分的模块。
We conclude this section by again acknowledging that there is a very ‘thin’ notion of module such that, given certain other assumptions, a double dissociation entails modularity (in that sense). For example, if we simply assume that a distinct module underlies each distinct capacity (c.f. the ‘One-to-one’ assumption discussed in §21.2.2), with all normal subjects sharing the same architecture, and if we count a dissociation of capacities in any two people as indicating that those capacities are distinct (across all subjects), then we have an unproblematic inference from dissociability to distinctness of modules. However, this pretty inference is bought at the cost of a not-very-interesting notion of ‘module.’ As soon as we begin to invest modules with other, ‘thicker’ properties—like informational encapsulation or independent evolvability (§§21.4.3–21.4.4)—the inference becomes far less compelling, as these properties do not necessarily apply to modules as distinguished by the dissociability criterion.
人们通常认为模块是信息封装的,因为其他心理系统只能访问模块输出的信息;模块内部进行的处理无法访问或受大脑其他部分的信息或过程影响(Fodor,1983)。然而,尚不清楚这一特征在挑选不同的认知机制方面有多大用处。首先,信息封装通常是相对的,而不是全有或全无的问题。一些大脑或心理机制可能完全与其他机制信息隔离(即机制 A 不会受到机制 B 内部影响的情况)。但许多(如果不是几乎所有)机制在其内部操作中都受到其他机制的影响——或者至少如果我们不轻视“内部操作”的概念(见下文),情况确实如此。与此相关的是,信息封装通常似乎与任务有关。机制或大脑区域 A 在其内部处理中是否受到机制或区域 B 中的信息或处理的影响可能因 A 和 B 所从事的任务而异。
Modules are also often said to be informationally encapsulated in the sense that other psychological systems have access only to the information that is the output of the module; the processing that goes on within it is not accessible to, or influenced by, information or processes in other parts of the mind (Fodor, 1983). However, it is not clear how useful this feature is in picking out distinct cognitive mechanisms. First, informational encapsulation is often a relative, rather than an all-or-nothing matter. It’s plausible that some brain or psychological mechanisms may be completely informationally isolated from some other mechanisms (in the sense that there are no circumstances in which mechanism A is internally influenced by mechanism B). But many if not virtually all mechanisms are influenced in their internal operations by some other mechanisms—or at least this is true if we don’t trivialize the notion of an ‘internal operation’ (see below). Relatedly, informational encapsulation often seems to be task-relative. Whether mechanism or brain region A is influenced in its internal processing by information or processing in mechanism or region B may vary depending on the tasks A and B are engaged in.
为了说明这些观点,我们来看看注意力在许多心理过程中的作用。有证据表明,尽管低级视觉处理(例如发生在初级视觉区域 V1 中的处理)通常相对封闭,但它可以通过发生在其他神经区域中的涉及视觉注意力的高级处理进行修改(Luck 和 Hillyard,2000)。这种结果引发了关于 EP 任务个体化的常见问题:V1 中的处理是否根据是否涉及注意力而执行不同的任务或功能?它还削弱了模块化封装标准的实用性。V1 对视觉刺激的处理会根据受试者是否注意该刺激而改变,这一事实是否表明 V1 对于涉及注意力的任务是非封装的,但对于不涉及注意力的其他任务是封装的?如果是这样,那么是否有两个与 V1 相关的模块,一个在涉及注意力时起作用,另一个在不涉及注意力时起作用? Peter Carruthers(私人通信)认为,如果注意力有时会影响 V1 中的处理,那么注意力应该算作V1 的输入,而不是对其内部处理的影响。因此,他认为,V1 中的处理毕竟是封装的。我们的回应是,信息封装的概念只有在存在一些区分模块输入和影响模块内部操作的过程的基础时才有意义,因为封装的说法否认的是后一种影响。如果对机制内部处理的任何信息影响都可以重新概念化为该机制的输入,并且如果通过输入产生的影响与封装一致,那么信息封装的概念就是空洞的。
As an illustration of these points, consider the role of attention in many psychological processes. There is evidence that although low-level visual processing, such as occurs in the primary visual area V1, is often relatively encapsulated, it can be modified by higher-level processes involving visual attention, which occur in other neural regions (Luck and Hillyard, 2000). This kind of result raises familiar issues about EP’s individuation of tasks: are the processes in V1 performing different tasks or functions depending on whether attention is involved? It also undermines the usefulness of the encapsulation criterion for modularity. Does the fact that the processing of a visual stimulus by V1 is altered depending on whether subjects pay attention to that stimulus show that V1 is unencapsulated with respect to tasks that involve attention but encapsulated with respect to other tasks not involving attention? If so, are there two modules associated with V1, one operative when attention is involved and the other when it’s not? Peter Carruthers (private communication) suggests that if attention sometimes influences processing in V1, then attention should count as an input to V1, not an influence on its internal processing. Hence, he argues, processing in V1 is encapsulated after all. Our response is that the notion of informational encapsulation only makes sense if there is some basis for distinguishing between an inputs to a module and processes that influence the internal operation of the module, for it’s the latter kind of influence that claims of encapsulation deny. If any informational influence on the internal processing of a mechanism can be reconceptualized as an input to that mechanism, and if influence via input is consistent with encapsulation, then the notion of informational encapsulation is vacuous.
我们已经论证过(§21.3.1),可分离性是模块化的一个可疑标准。它在当前连接中也没有什么帮助,因为与通常的假设相反,封装与可分离性没有简单的联系。考虑一下范埃森(van Essen)绘制的猕猴视觉系统著名图表(图 21.1)。
We have already argued (§21.3.1) that dissociability is a dubious criterion for modularity. It’s also of little help in the present connection, for contrary to what is often assumed, encapsulation bears no simple connection to dissociability. Consider the well-known diagram of the macaque visual system due to van Essen (figure 21.1).
图 21.1
猕猴大脑中视觉区域的层次结构,显示了 32 个视觉皮层区域及其连接,以及一些皮层下和非视觉连接。(摘自 Felleman 和 van Essen,1991 年。)
Figure 21.1
The hierarchy of visual areas in the macaque brain, showing 32 areas of visual cortex and their linkages, together with some subcortical and nonvisual connections. (From Felleman and van Essen, 1991.)
我们在这里看到大约 32 个皮质“区域”,以及一些皮质下区域。这些区域对不同类型的刺激有不同的敏感度和/或专门用于不同类型的处理(尽管通常不是全有或全无的方式)。假设它们也至少在某种程度上容易分离,我们似乎有(根据 1:1 和可分离性标准)多达 32 个不同的模块,如该图所示。然而,这些皮质区域也高度互联:van Essen 追踪了 197 个链接(= 原则上可能的 32 × 31/2 ≈ 500 个链接的约 40%)。大多数这些链接似乎是相互的,表明假设的模块之间没有简单的顺序或层次信息流方向;相反,每个模块在视觉处理的多个不同阶段与许多其他模块对话(并被其他模块对话)。这引发了一个严重的问题,即可分离性标准应该如何与模块化的封装标准保持一致,以及如何解释后者的标准。图 21.1中发现的互连性是否与这些区域是不同的模块相一致?如果是这样,那么封装(也许模块化也一样)似乎有程度之分,而不是全有或全无的问题,在这种情况下,我们需要 (i) 某种程度的封装度量和 (ii) 关于这如何影响模块化判断的理论。另一方面,如果模块化主义者更愿意说这种程度的互连性与区域形成不同模块的想法一致,那么功能的不同性和可分离性并不是区分模块的可靠标准。
We see here some 32 cortical ‘areas,’ as well as some subcortical areas. These areas are differentially sensitive to different sorts of stimuli and/or specialized for different sorts of processing (although typically not in an all-or-nothing fashion). Assuming that they are also susceptible of at least some degree of dissociation, we would appear to have (by the 1:1 and dissociability criteria) as many as 32 distinct modules depicted in this diagram. However, these cortical areas are also highly interconnected: van Essen traced 197 linkages (= roughly 40% of the 32 × 31/2 ≈ 500 linkages that are in principle possible). Most of these linkages appear to be reciprocal, indicating that there is no simple sequential or hierarchical direction of information flow among the postulated modules; instead, each module talks to (and is talked to by) numerous others at numerous different stages of visual processing. This raises serious questions about how the dissociability criterion is supposed to line up with the encapsulation criterion for modularity and how the latter criterion is to be interpreted. Is the sort of interconnectedness found in figure 21.1 consistent with these areas being distinct modules? If so, it looks as though encapsulation (and perhaps modularity as well) come in degrees, rather than being all-or-nothing matters, in which case we need (i) some measure of degree of encapsulation and (ii) a theory about how this bears on judgments of modularity. If, on the other hand, modularists prefer to say that this degree of interconnectedness is inconsistent with the idea that the areas form distinct modules, then it follows that distinctness of function and dissociability are not reliable criteria for individuating modules.
模块化的另一个标准是模块是自然选择的独立目标。也就是说,选择必须能够独立于其他模块改变每个模块。模块的这一特征是由 EP 的典型观点所预设的,即生物体面临着大量独立的适应性问题,每个问题都通过解决方案获得一个独立的进化模块。
Still another criterion for modularity is that modules are independent targets of natural selection. That is, selection must be able to change each of them independently of the others. This feature of modules is presupposed by EP’s characteristic view of organisms as confronting a large collection of separate adaptive problems, each of which gets an independent evolved module by way of solution.
然而,独立进化标准同样存在问题。因为如果一个特征要成为选择的独立目标,那么它必须是 Sterelny 和 Griffiths (1999) 所说的“马赛克”特征,而不是“连接”特征。5用他们的一个例子来说,肤色是马赛克特征的合理候选者,因为“它可以在生物体其他部分发生相对较少变化的情况下进化”(1999,第 320 页)。相比之下,拥有两个肺是一种连接特征:你不能在不改变生物体其他部分的情况下改变这个特征,因为肺的数量受控制生物体双侧对称性的基因和发育机制的影响。因此,自然选择只能通过影响这些基因和发育机制来影响肺的数量,而这反过来又会影响许多其他表型特征。由于肺数量并不是一个独立的选择目标——因为它是“更大选择”的一部分(Sterelny and Griffiths,1999,第 320 页),因此试图简单地对我们有两个肺做出适应性解释是错误的。相反,需要解释的是双侧对称性的进化。
The independent evolvability criterion, however, is again problematic. For if a trait is to be an independent target of selection, it must be what Sterelny and Griffiths (1999) call a ‘mosaic’ rather than a ‘connected’ trait.5 To use one of their examples, skin color is a plausible candidate for a mosaic trait because “it can evolve with relatively little change in the rest of the organism” (1999, p. 320). By contrast, having two lungs is a connected trait: you can’t change this trait without changing a great deal else in the organism because lung number is influenced by the genes and developmental mechanisms that govern the bilateral symmetry of the organism. Hence, natural selection can only influence lung number by influencing these genes and developmental mechanisms, and this in turn would affect many other phenotypical features. Since lung number is not an independent target of selection—since it is part of the “bigger package” (Sterelny and Griffiths, 1999, p. 320)—it would be a mistake to try to give an adaptive explanation of our having two lungs simpliciter. Instead, what needs to be explained is the evolution of bilateral symmetry.
进化心理学家认为模块是独立进化的,也就是说,它们是马赛克特征(如肤色)而不是关联特征(如有两个肺)。然而,有证据表明,许多人类认知能力可能是关联的,而不是马赛克特征。例如,Finlay 和 Darlington (1995) 表明,不同物种的大脑结构大小变化高度协调且可预测:与整体大脑尺寸相比,同源结构以不同但稳定的速度增大。人们认为,这些规律反映了神经发生顺序中根深蒂固的发展限制,这表明虽然自然选择可以增加(或减少)整个大脑的大小,但特定皮质区域的大小无法独立改变,即使是在应对特定和紧迫的选择性问题时也是如此。因此,自然选择可能无法“微调”负责(比如)作弊者检测或数字感知的皮质区域,而不受(据称)不同的认知模块的影响,这些认知模块是其他认知能力(如面部识别或语言)的基础。
Evolutionary psychologists assume that modules are independently evolvable, that is, that they are mosaic traits (like skin color) rather than connected traits (like having two lungs). However, there is evidence that many human cognitive abilities may be connected rather than mosaic traits. For example, Finlay and Darlington (1995) show that brain structures change in size across species in a highly coordinated and predictable manner: homologous structures enlarge at different but stable rates when compared to overall brain size. It is thought that these regularities reflect deeply entrenched developmental constraints on the order of neurogenesis, suggesting that while natural selection can increase (or decrease) the size of the brain as a whole, the sizes of particular cortical regions cannot be changed independently, even in response to specific and pressing selective problems. Thus, natural selection may not be able to ‘fine tune’ the cortical regions responsible for (say) cheater detection or the perception of numerosity independently of the (allegedly) distinct cognitive modules that underlie other cognitive capacities like face recognition or language.
另一个问题涉及独立进化标准与上面讨论的模块的其他特征之间的关系。我们认为这些属性之间没有任何联系:独立进化不包含独立破坏性或信息封装,也不由独立破坏性或信息封装所包含。事实上,本文提出的论点的结果是,通常归因于模块的任何属性(独立破坏性、信息封装、先天性、独立进化性)之间没有任何联系。
A further question concerns the relationship between the independent evolvability criterion and the other features of modules discussed above. We submit that there is no connection between these properties: independent evolvability does not entail, and is not entailed by, either independent disruptability or informational encapsulation. Indeed, it is a consequence of the arguments presented in this paper that there is no connection whatsoever between any of the properties—independent disruptability, informational encapsulation, innateness, independent evolvability—that are commonly ascribed to modules.
这很重要,因为它破坏了进化心理学文献中非常普遍的一种论证模式。进化心理学家在某种意义上为模块的存在提供了证据(例如,在两个任务上的表现分离的意义上),然后继续假设(没有论证)所讨论的模块也满足上面讨论的其他标准。因此,他们从我们在本文中讨论的各种“薄”意义上的模块化假设转向关于模块存在的更“厚”和更实质性的主张。
This is important, because it undermines a pattern of argument that is highly prevalent in the EP literature. Evolutionary psychologists provide evidence for the existence of a module in some sense (e.g. in the sense that performance on two tasks dissociates) and then go on to assume (without argument) that the module in question satisfies the other criteria discussed above as well. Thus, they slide from hypotheses of modularity in one of the various ‘thin’ senses we have discussed in this paper to claims about the existence of modules in a much ‘thicker’ and more substantive sense.
这张幻灯片完全没有道理。为了说明这一点,请考虑 Cosmides 和 Tooby (1992a) 在 Wason 选择任务上进行的著名实验,以及他们随后提出的“作弊者检测”模块假设。从表面上看,他们的实验结果表明的是,人们在处理作为社会交换规则的条件句时的行为与处理包含其他内容的条件句时的行为不同(在某些方面更可靠)。即使我们接受这些结果在作弊检测任务上建立了不同的表现(而不仅仅是那些涉及条件的任务——这本身就是一个巨大的飞跃),它们也不能构成更稳健意义上存在独特作弊检测模块的证据。也就是说,它们甚至没有暗示作弊检测是由一种独立可破坏的、信息封装的心理机制所驱动的,这种机制受到不同的选择压力,因此是基因指定的或“天生的”等等。当然,可以想象,尽管(我们认为)不太可能,但存在一个拥有所有这些特征的作弊检测模块;我们的观点是,Tooby 和 Cosmides 的实验没有提供任何证据表明它确实存在。
This slide is wholly unjustified. As an illustration, consider Cosmides and Tooby’s (1992a) well-known experiments on the Wason selection task and their subsequent hypothesis of a ‘cheater detection’ module. Prima facie, what their experimental results show is that people behave differently (and in some respects more reliably) when dealing with conditionals framed as rules governing social exchange than they do when dealing with conditionals with other contents. Even if we accept that these results establish differential performance on cheater-detection tasks tout court (and not just those that involve conditionals—itself a big jump), they do not constitute evidence for the existence of a distinct cheater-detection module in any more robust sense. That is, they do not even remotely suggest that cheater detection is subserved by an independently-disruptable, informationally-encapsulated psychological mechanism which has been subject to distinct selection pressures and which as a consequence is genetically specified or ‘innate’ etc. It is of course conceivable, although (we think) unlikely, that a cheater detection module possessing all these features exists; our point is that Tooby and Cosmides’ experiment provides no evidence that it does.
我们在 §21.4 中的整体论证可以表述如下。从某种角度解释(涉及一个足够“薄”的模块概念),EP 关于模块化的主张只不过是对某些实验结果或进化心理学家的功能推测的重新描述。如此解释,关于“心理模块”存在的主张是没有争议的——但也毫无趣味。随着“薄”模块概念扩展到包括上述其他属性,模块化主张变得更有内容,更有趣。然而,不仅支持此类扩展的证据很少,而且我们对大脑的了解使得 EP 的主要理论重点——高级认知能力——不太可能存在“厚”心理模块。
Our overall argument in §21.4 can be put as follows. Interpreted one way (as involving a sufficiently ‘thin’ conception of a module), EP’s claims about modularity amount to little more than redescriptions of certain experimental results or evolutionary psychologists’ functional speculations. So construed, claims about the existence of ‘mental modules’ are uncontroversial—but also uninteresting. Modularity claims become more contentful and more interesting as the ‘thin’ notion of a module is extended to include the other properties described above. However, not only is the evidence that would support such extensions is rarely provided, what we know about the brain makes it unlikely that there could be ‘thick’ mental modules for the sorts of high-level cognitive capacities that are EP’s main theoretical focus.
EP 的吸引力很大程度上来自于这样一个事实:它似乎提供了一种“生物学化”认知科学的方法,其中进化考虑因素据称为心理学理论提供了强有力的额外约束。我们认为这种表象具有误导性。进化心理学家在很大程度上忽略了具有最强科学依据、与他们关于心理机制的主张最直接相关的生物学证据。这不仅包括来自神经生物学、遗传学和发育生物学的证据,还包括任何可能破坏他们空想的适应论的进化生物学、动物行为学和种群遗传学证据。他们的方法错误地假设人们可以在不了解结构、基因和发育信息的情况下对生物和心理功能进行有益的推测。他们的核心理论概念——模块化——从根本上就不清楚。他们将心智描绘为“大规模模块化”,但这未能充分体现其许多最重要的特征,例如其参与长期规划的能力以及其非凡的认知和行为灵活性。
Much of the appeal of EP derives from the fact that it appears to provide a way of ‘biologizing’ cognitive science, with evolutionary considerations supposedly providing powerful additional constraints on psychological theorizing. We think that this appearance is misleading. Evolutionary psychologists largely ignore the biological evidence that has the strongest scientific credentials and is most directly relevant to their claims about psychological mechanisms. This includes not only evidence from neurobiology, genetics, and developmental biology, but also any evidence from evolutionary biology, ethology and population genetics that threatens to undermine their armchair adaptationism. Their methods assume, wrongly, that one can usefully speculate about biological and psychological functions in ignorance of information about structure, genes, and development. Their central theoretical concept—modularity—is left fundamentally unclear. And their picture of the mind as ‘massively modular’ fails to do justice to many of its most important features, such as its capacity to engage in long-range planning and its remarkable cognitive and behavioral flexibility.
1.我们承认,我们的讨论甚至还没有开始解释人类如何设法考虑广泛的背景信息并灵活合理地采取行动。但模块化理论的情况更糟糕。它们不仅未能对问题如何解决提供积极的解释,而且还做出了与我们确实(以某种方式)解决问题的事实不一致的假设。或者说,在它们试图适应这一事实的范围内,它们被迫放弃模块化解释的基本承诺。
1. We concede that our discussion does not even begin to explain how human beings manage to take account of a wide range of background information and act flexibly and reasonably. But modular theories are in far worse shape. They not only fail to provide a positive account of how the problem is solved, but also make assumptions that are inconsistent with the fact that we do (somehow) solve the problem. Alternatively, and to the extent that they do attempt to accommodate this fact, they are forced to abandon basic commitments of the modular account.
2.当存在任务的“双重分离”时,分离通常被认为是独立模块的特别有力的证据,也就是说,当观察到一对个体时,其中一个人可以执行任务 A 但不能执行任务 B,而另一个人可以执行 B 但不能执行 A。
2. Dissociations are often thought to particularly compelling evidence of independent modules when there is a ‘double dissociation’ of tasks, that is, when a pair of individuals is observed, one of whom can perform task A but not task B, and the other of whom can perform B but not A.
3.这个问题引起了很多争论。有关调查,请参阅 Shallice(1988 年)和 Glymour(2001 年)。
3. This issue is the subject of considerable debate. See Shallice, 1988 and Glymour, 2001 for surveys.
4 . 参见,例如,Vargha-Khadem 和 Passingham,1990;Anderson 等人,1993;Merzenich 等人,1996。
4. See, e.g., Vargha-Khadem and Passingham, 1990, Anderson et al., 1993, Merzenich et al., 1996.
5. Gilbert (2000, 693ff.) 将此称为“模块化”的要求——不要将其与 EP 所涉及的认知模块化相混淆。
5. Gilbert (2000, 693ff.) calls this a requirement of ‘modularity’—not to be confused with the cognitive modularity that concerns EP.
数学是典型的抽象追求。计算,就其被理解为应用数学的一个分支而言,似乎具有数学的一些无形性。而人工智能的可能性似乎取决于这样一个事实:计算的内容并不重要——只要它们能够实现相同的功能,硅和大脑都是一样的。然而,当我们观察实际的智能生物时,很明显我们的大部分智能都用于环游世界。用 Graziano (2008) 的恰当说法,我们是“智能运动机器”。一个学习爬单杠的孩子正在从事一些非常智能的事情——然而,很难看出如何在不遗漏重要内容的情况下用纯数学来捕捉这种智能。观察一只蚂蚁爬过沙丘时,Simon 有理由想知道观察到的行为复杂性有多少可以归因于蚂蚁,又有多少可以归因于它所移动的世界(Simon,1969 年)。
Mathematics is the paradigmatic abstract pursuit. Computation, insofar as it is understood as a branch of applied mathematics, would seem to share some of math’s disembodied nature. And the very possibility of artificial intelligence would seem to rest on the fact that it doesn’t really matter what is doing the computing—silicon and brains are all the same, so long as they can implement the same functions. Yet when we look at actual intelligent organisms, it is obvious that much of our intelligence is devoted just to getting around the world. We are, in Graziano (2008)’s apt phrase, “intelligent movement machines.” A child learning to climb the monkey bars is engaged in something deeply intelligent—and yet it is very hard to see how to capture that intelligence in pure mathematics without leaving out something important. Watching an ant make its way over hills of sand, Simon reasonably wondered how much of the observed complexity in behavior can be attributed to the ant and how much can be attributed to the world through which it moves (Simon, 1969).
本结论部分的三篇论文探讨了这一见解的各个方面。他们提出一个问题:在多大程度上,理解智能或一般认知需要关注智能所处的活跃物理系统(思维在其中“体现”)以及这些系统所处的世界(思维在其中“嵌入”)?
The three papers in this concluding part grapple with various aspects of this insight. They ask to what extent understanding intelligence, or cognition generally, requires attending to the active, physical systems in which it is instantiated (in which the mind is said to be “embodied”) and the world in which those systems are working (in which the mind is said to be “embedded”)?
这两个主张是豪格兰的经典著作《心灵的具身与嵌入》(第 22 章)的重点。豪格兰从西蒙对蚂蚁的洞察开始,认为我们与世界之间的耦合可能非常紧密,以至于很难将心灵、身体和世界分解为机制中的独立组成部分。我们在世界上的许多活动都需要“头脑中”发生的事情与世界本身之间紧密耦合的互动。玩俄罗斯方块的人可能会操纵下落的物体,这样他们就可以“看到”它们如何融入下面的行中,开车去丹佛的人可能会在标志指示他们上 I-70 高速公路并下车时这样做。这些都是所谓的“主动外在主义”的例子。然而,豪格兰似乎提出意义也是世界性的,它以世俗物体中的“意义之网”或一个人参与概念使用者社区的承诺为支架(通常称为“意义外在主义”)。 (另请参阅第三部分介绍中关于意向性外部化观点的讨论。)
These two claims are the focus of Haugeland’s classic “Mind Embodied and Embedded” (chapter 22). Starting with Simon’s insight about the ant, Haugeland argues that the coupling between ourselves and the world might be tight enough that it is hard to decompose the mind, body, and world into independent components in a mechanism. Many of our activities in the world require tightly coupled interactions between what’s going on “in the head” and in the world itself. A person playing Tetris might manipulate the falling objects so they can just “see” how they fit into the row beneath them, and a person driving to Denver might just get on I-70 and exit when the signs instruct them to do so. These are examples of what has come to be called “active externalism.” Yet Haugeland seems to propose that meaning is also world-involving, scaffolding itself on a “web of significance” in worldly objects or on one’s commitment to participate in a community of concept-users (often referred to as “meaning externalism”). (See also the discussion of externalized views of intentionality in the introduction to part III.)
请注意,人工智能可以化身为人,但并非人类。该领域大量工作的一项关键承诺是,对人工智能的真正考验将是移动机器人,它们在执行复杂的现实世界任务时与世界实时互动。
Note that it is possible to be embodied without being human. A key commitment of much work in this area is that the true test for AI will be mobile robots that interact with the world in real time while doing complex, real-world tasks.
Brooks 的《没有表征的智能》(第 23 章)从工程角度论证了构建自主移动机器人的最佳方式是避免让它们构建复杂的世界模型。正如 Brooks 所建议的,机器人可以让世界成为它自己的最佳表征。如果它需要查看冰箱里是否有一瓶根汁啤酒,让它打开门并四处看看可能比让它存储和更新冰箱内容的内部表示更好。同样,布鲁克斯认为,我们可以制造智能机器,而不必假设(就像福多尔那样)存在某个将所有输入整合到中央认知或决策机制(执行或自我)中的中央系统;低级处理单元之间的最小约束可能足以产生我们想要的所有智能行为。他认为,一个好的机器人不会跟踪和操纵关于真实情况的静态事实,而是根据它可以对世界做什么来划分世界,并根据完成任务所需的信息采取行动。
“Intelligence without Representation,” by Brooks (chapter 23), argues from an engineering perspective that the best way to build autonomous mobile robots is to avoid having them build complex models of the world. As Brooks suggests, the robot can let the world be its own best representation. If it needs to see if there’s a bottle of root beer in the fridge, it might be better to have it open the door and look around than it would be to have it store and update an internal representation of FRIDGECONTENTS. Similarly, Brooks argues, we can build intelligent machines without supposing (à la Fodor) that there is some central system that integrates all of the inputs into a central cognition or decision mechanism (an executive or a self); minimal constraints among low-level processing units might suffice to produce all the intelligent behaviors we want. Rather than track and manipulate static facts about what’s true, he argues, a good robot divides up the world in terms of what it can do to it and acts on the information required to get tasks done.
最后,韦伯的《仿生机器人为哲学提供了什么?两个导航系统的故事》(第 24 章)概述了自 20 世纪 90 年代中期以来机器人技术和昆虫神经科学之间富有成效的交集。本书详细介绍的许多其他人工智能方法都需要大量的计算资源。相比之下,沙漠蚂蚁和蜜蜂尽管神经元数量比人类少几个数量级,却能够完成惊人的导航壮举。韦伯表明,实现这一目标的许多机制现在已相当清楚,但认知表征理论的结果仍是一个有争议的问题。然而,很明显的是,无论是研究还是构建仅依赖相对轻量级表征系统的简单具身系统,我们都可以学到很多东西。
Finally, Webb’s “What Does Biorobotics Offer Philosophy? A Tale of Two Navigation Systems” (chapter 24) canvasses the fruitful overlap between robotics and insect neuroscience that has been done since the mid-1990s. Many of the other AI approaches detailed in this volume require substantial computational resources. By contrast, the desert ant and the honeybee are capable of prodigious feats of navigation despite having orders of magnitude fewer neurons than humans. Webb demonstrates that many of the mechanisms by which this is achieved are now reasonably clear, but that the upshot for representational theories of cognition remains a contested question. What is clear, however, is that there is a lot to learn from both studying and building simple, embodied systems that rely only on relatively lightweight representational systems.
本部分的内容与第一部分和第二部分的内容存在对话(有时是明确的对话)。将传统的马里安理论对计算和算法层面的关注与本章中注重实施的内容进行对比是有用的。当我们抹去思想、身体和世界之间的概念界限时,我们也模糊了任务的抽象特征与实现任务的机制之间的区别:机制本身是计算的一部分。
The material in this part is in dialogue—sometimes explicitly—with the material in parts I and II. It is useful to contrast (say) the traditional Marrian focus on the computational and algorithmic levels with the implementation-focused material in this chapter. When we erase the conceptual boundaries between mind, body, and world, we also blur the distinction between the abstract characterization of the task and the mechanisms in terms of which it is implemented: the mechanisms are themselves part of the computation.
机器人技术是现代工程的重要组成部分,但现在许多此类工作都在专有环境中进行。伦理学和相关领域有大量关于机器人技术的研究,尤其是随着机器人进入交通和护理等领域。然而,我们已经将人工智能伦理放在一边,最近其他子领域关于机器人技术的哲学研究似乎相当薄弱。这为热情的哲学家们未来的工作提供了良好的机会。值得注意的一个趋势是将不确定性纳入机器人技术:现实中的机器人不能完全确定它知道发生了什么,或者它的动作是否产生了预期的效果。用部分可观察马尔可夫决策过程 (POMDP) 对机器人决策进行建模已经变得流行起来。POMDP 自然而然地作为第 16 章讨论的强化学习 (RL) 技术的扩展而出现。随着具有实际变量数量的 POMDP 求解器的发现(请参阅 Kurniawati 2021 的最新评论),这对机器人技术变得更加重要。
Robotics is an important part of modern engineering, but much of this work now goes on in proprietary environments. There is a lot of work on robotics in ethics and cognate fields, especially as robots press into fields like transportation and nursing. We have set aside ethics of AI, however, and recent philosophical work on robotics in other subfields appears to be rather thin on the ground. This represents a good opportunity for future work by enthusiastic philosophers. One trend worth noting is the incorporation of uncertainty into robotics: a realistic robot cannot be entirely sure it knows what is going on or whether its actions have had the desired effect. Modeling robotic decision-making in terms of Partially Observable Markov Decision Processes (POMDPs) has become popular. POMDPs arise naturally as extensions of the reinforcement learning (RL) techniques discussed in chapter 16. This has become more relevant for robotics as solvers for POMDPs with realistic numbers of variables have been discovered (see Kurniawati 2021 for a recent review).
对于想要一窥布鲁克斯所写时代的读者来说,埃罗尔·莫里斯 1997 年的纪录片《快速、廉价、失控》展示了布鲁克斯的一些机器人的运行情况,是对分布式智能概念的一次有趣探索。
For readers who want a peek into the era about which Brooks writes, Errol Morris’s 1997 documentary “Fast, Cheap, and Out of Control” features some of Brooks’s robots in action and is an interesting exploration of the idea of distributed intelligence more generally.
有时人们认为,具身化可以解决第三部分中提出的关于意向性的问题:粗略地说,中文房间可能不具有意向性,但一个适当配置的机器人可以在世界中移动。这种可能性在对 Searle 的论点的回应中被提出并广泛讨论(本卷第 12 章)。
Embodiment has sometimes been thought to solve the issues about intentionality raised in part III: crudely speaking, the Chinese room may not have intentionality, but an appropriately configured robot moving about in a world would. This possibility has been raised and discussed extensively in the responses to Searle’s arguments (chapter 12 of this volume).
动态系统。 大多数工作哲学家熟悉的计算模型都是数字化的,人工智能的进步几乎完全来自数字领域。然而,正如马利在本卷第 5 章中指出的那样,也有模拟计算模型。此外,从维纳 (1948) 开始,模拟计算早已在人工智能领域占有一席之地。控制论传统继续影响着机器人技术和人工智能,包括第 IV 部分讨论的预测处理模型。当《心智设计 II》出版时,人们对所谓的动态系统理论 (DST) 产生了相当大的热情。DST 主张一种所谓的动态认知方法,其中认知最好使用微分方程提供的连续模型来建模。动态系统理论家的重点往往是与世界持续互动的具体系统的实时行动。对 DST 的最初热情似乎有所减弱——毫无疑问,部分原因是微分方程在数学上仍然很复杂,而数字计算机的基础知识非常容易掌握。然而,DST 已经找到了许多应用,并继续成为心灵哲学中的一个重要研究项目。
Dynamic Systems. Most models of computation familiar to working philosophers are digital, and advances in AI have come almost exclusively in the digital space. Yet as Maley pointed out in chapter 5 of this volume, there are analog models of computation as well. Furthermore, analog computation has long had a place at the AI table, beginning with Wiener (1948). The cybernetic tradition continues to inform both robotics and AI, including the predictive processing models discussed in part IV. When Mind Design II was published, there was considerable enthusiasm about so-called Dynamic Systems Theory (DST). DST argues for a so-called dynamic approach to cognition, on which cognition is best modelled using continuous models like those offered by differential equations. The focus of the dynamic systems theorists is often squarely on real-time action by embodied systems interacting continuously with the world. Initial enthusiasm for DST appears to have waned a bit—no doubt in part because differential equations remain mathematically heavy going, whereas the basics of digital computers are very easy to grasp. Yet DST has found many applications and continues to form an important research program in philosophy of mind.
主动外在主义。 许多方法强调心灵与世界之间的紧密联系,尤其是在熟练的行动中。这种联系仍然只是一种因果关系,尽管这种联系特别紧密:心灵本身只依赖于大脑,并且安全地锁定在头骨的范围内。一些作者甚至走得更远,认为心灵实际上可以由外部物体构成:也就是说,在适当的情况下,你的心灵实际上延伸到环境本身。所谓的扩展心灵假设对人工智能来说很有趣,至少有两个原因。一方面,它表明心灵的正确分析单位是大脑/机器加上环境。当前的人工智能可能缺少对人类智能至关重要的支架环境。另一方面,它表明与计算系统(智能手机、谷歌等)的紧密集成实际上扩展了我们自己的认知能力。请注意,这与第三部分讨论中提出的内容外在主义的问题平行但又不同——在那里,问题是关于两个具有相同内部的东西是否可以具有不同的思想内容,而这里,问题更多地是关于如何理解心灵本身。
Active Externalism. Many approaches emphasize a tight connection between mind and world, especially in skilled action. The connection is still a merely causal relationship, albeit a particularly tight one: the mind itself depends only on the brain, and remains safely locked inside the bounds of the skull. Some authors have gone even further and suggest that the mind can actually be constituted by external objects: that is, in the right cases, your mind literally extends into the environment itself. The so-called extended mind hypothesis is interesting for AI for at least two reasons. On the one hand, it suggests that the right unit of analysis of minds is brains/machines plus environments. Current AI might be missing the scaffolded environment that is key to human intelligence. On the other hand, it suggests that tight integration with computational systems—smartphones, Google, and the like—literally expands our own cognitive capacities. Note that this is parallel to but distinct from the questions about content externalism raised in the discussion of part III—there, the question was about whether two things with identical insides could have different contents of thought, whereas here, the question is much more about how to understand the mind itself.
约翰·豪格兰
John Haugeland
1998
1998
笛卡尔最持久和最重要的成就之一是将精神构建为一个独立的本体论领域。通过将心灵视为一种物质,将认知视为其模式,他赋予了它们独立的地位,无需考虑其他实体。只有有了这种形而上学的概念,唯我论的思想——一个完整的自我存在于宇宙中而没有其他任何东西的思想——才有意义。而在这台引擎后面拖着一车车可怜兮兮的夸张怀疑、身心问题、外部世界问题、其他心灵问题等等。
Among Descartes’s most lasting and consequential achievements has been his constitution of the mental as an independent ontological domain. By taking the mind as a substance, with cognitions as its modes, he accorded them a status as self-standing and determinate on their own, without essential regard to other entities. Only with this metaphysical conception in place, could the idea of solipsism—the idea of an intact ego existing with nothing else in the universe—so much as make sense. And behind that engine have trailed the sorry boxcars of hyperbolic doubt, the mind-body problem, the problem of the external world, the problem of other minds, and so on.
尽管自黑格尔以来,这些基本假设就一直受到批评,包括近年来愈演愈烈的批评,但大多数批评都是普遍性的,我称之为“相互关系论”。他们通常认为,心理或认知具有某些基本特征,如意向性或规范性,并认为除非通过参与某种超个人的关系网络,否则不可能实现这一特征。例如,基于解释和“仁爱原则”的论述,如唐纳德·戴维森和丹尼尔·丹尼特(源于奎因)的论述,仅将内容状态归因于在上下文中合理的整体模式的组成部分——即与系统的情况或环境相关的部分。同样,社会实践论述,如理查德·罗蒂和罗伯特·布兰登(源于塞拉斯和维特根斯坦)的论述,理解使推理和内容能够共同制定的规范——即与他人的实践和反应相关的规范。无论从哪一种方法来看,唯我论都不是一个连贯的可能性。
Although the underlying assumptions have been under fire, off and on, at least since Hegel—including with renewed intensity in recent years—most of the challenges have been of a general sort that I will call “interrelationist”. Characteristically, they accept as a premise that the mental, or at any rate the cognitive, has some essential feature, such as intentionality or normativity, and then argue that this feature is impossible except through participation in some supra-individual network of relations. For instance, accounts based on interpretation and the “principle of charity”, such as those of Donald Davidson and Daniel Dennett (with roots in Quine), ascribe contentful states only as components of an overall pattern that is rational in context—that is, in relation to the system’s situation or environment. Similarly, social practice accounts, such as those of Richard Rorty and Robert Brandom (with roots in Sellars and Wittgenstein), understand the norms that enable reasoning and content to be instituted communally—that is, in relation to the practices and responses of others. On neither approach is solipsism even a coherent possibility.
相互关系论证是整体论证,具体来说,它们认为认知现象是某一类现象的成员,每个现象只有通过与其他现象的确定关系才具有其相关特征——实际上,相关特征就是其在更大模式或整体中的“位置”。一个明显的例子是游戏中的移动或游戏:推动一小块形似炮塔的塑料只能在其他棋子和移动的适当空间和时间背景下移动车。脱离任何此类背景而称其为车移动简直是无稽之谈。同样,推理是,将任何现象视为与相关整体分离的有意或规范也是无稽之谈。而且,由于在心理归因的情况下,相关整体必须包括个人的环境和/或社区,因此心理领域的笛卡尔独立性是不可能的。
Interrelationist arguments are holistic in the specific sense that they take cognitive phenomena to be members of some class of phenomena, each of which has its relevant character only by virtue of its determinate relations to the others—that relevant character being, in effect, nothing other than its “place” in the larger pattern or whole. The obvious example is a move or play in a game: pushing around a little piece of plastic shaped like a turret could only amount to a rook move in an appropriate spatial and temporal context of other chess pieces and moves. To call it a rook move apart from any such context is simply nonsense. Likewise, so the reasoning goes, to regard any phenomenon as intentional or normative in isolation from the relevant whole, is also nonsense. And since, in the case of mental attributions, the relevant whole must include the individual’s environment and/or community, the Cartesian independence of the mental realm is impossible.
虽然这些考虑无疑很重要且引人注目,但在我看来,它们严重不完整,而且可能扭曲事实。它们仍然是理论性的或知识性的,不仅没有削弱,反而实际上强化了笛卡尔分离的一个方面,这一方面仍然如此普遍,几乎看不见。特别是,相互关系主义的解释保留了精神和物质之间的原则性区别——这种区别反映在语义与句法、理由空间与原因空间或意向与物理词汇等对比中。(请注意,这些对比中的每一个都可以被理解为高级与低级,或内部与外部“领域”。)这种分离的对立面——或一系列分离——不是相互关系主义的整体论,而是我想称之为心灵在世界中的体现和嵌入的亲密性。 “亲密”一词不仅意味着必要的相互关系或相互依赖,还意味着心灵、身体和世界的一种混合或整体性——也就是说,削弱它们之间的独特性。挑战在于阐明这可能意味着什么,而不是为它辩护。事实上,这种可能性似乎完全可以理解,但很快就出现了各种方法来揭示它的合理性和重要性。
While undeniably important and compelling, considerations like these seem to me seriously incomplete and potentially distorting. They remain theoretical or intellectual in a way that not only does not undermine but actually reinforces an aspect of the Cartesian separation that is still so pervasive as to be almost invisible. In particular, interrelationist accounts retain a principled distinction between the mental and the corporeal—a distinction that is reflected in contrasts like semantics versus syntax, the space of reasons versus the space of causes, or the intentional versus the physical vocabulary. (Notice that each of these contrasts can be heard either as higher versus lower “level” or as inner versus outer “sphere”.) The contrary of this separation—or battery of separations—is not interrelationist holism, but something that I would like to call the intimacy of the mind’s embodiment and embeddedness in the world. The term ‘intimacy’ is meant to suggest more than just necessary interrelation or interdependence but a kind of commingling or integralness of mind, body, and world—that is, to undermine their very distinctness. The challenge is as much to spell out what this could mean as to make a case for it. Indeed, no sooner does such a possibility seem intelligible at all, than ways to bring out its plausibility and significance turn up everywhere.
接下来的内容没有什么新意。策略是将一些众所周知的系统分析原理应用于心身世界“系统”,重新聚焦分裂与统一的问题,然后根据这一新焦点讨论一系列调查和建议——有些是相当新的,有些不是。希望这些表面上完全不同的想法(没有一个是新的)能够以一种阐明和支持它们的方式聚集在亲密关系的主题上。以这种方式对问题进行分类和调整有时会在“嵌入式计算”和“情境认知”等标题下进行讨论。
There is little original in what follows. The strategy will be to bring some well-known principles of systems analysis to bear on the mind-body-world “system” in a way that refocuses questions of division and unity, and then to canvass a selection of investigations and proposals—some fairly recent, others not—in the light of this new focus. The hope is that these superficially disparate ideas, none of them new, will seem to converge around the theme of intimacy in a way that illuminates and supports them all. Sorting and aligning issues in this manner has sometimes been discussed under titles like ‘embedded computation’ and ‘situated cognition’.
对于我想要探索的一系列现象,最简单的介绍是赫伯特·西蒙在《人工科学》第三章开头所讲的一个美丽寓言,他给这一章的副标题是“将人工嵌入自然”。
The simplest introduction to the range of phenomena I want to explore is the beautiful parable with which Herbert Simon opens chapter three of The Sciences of the Artificial, a chapter to which he gives the subtitle “Embedding Artifice in Nature”.
我们看着一只蚂蚁艰难地穿过风浪塑造的海滩。它向前走,向右倾斜,轻松地爬上陡峭的小沙丘,绕过一块鹅卵石,停下来与同伴交换信息。就这样,它走上了蜿蜒曲折的回家之路。……如果用几何图形来看待蚂蚁的路径,那么它不规则、复杂、难以描述。但它的复杂性实际上是海滩表面的复杂性,而不是蚂蚁的复杂性。(1969,63-64)
We watch an ant make his [sic] laborious way across a wind- and wave-molded beach. He moves ahead, angles to the right to ease his climb up a steep dunelet, detours around a pebble, stops for a moment to exchange information with a compatriot. Thus he makes his weaving, halting way back to his home.…Viewed as a geometric figure, the ant’s path is irregular, complex, hard to describe. But its complexity is really a complexity in the surface of the beach, not a complexity in the ant. (1969, 63–64)
西蒙两次总结了他的寓言的教训,除了那些表明行为主题的内容外,逐字逐句都相同。
Simon summarizes the lesson of his parable twice, word-for-word the same, except for those indicating the subject of the behavior.
蚂蚁(人类)作为一个行为系统,是相当简单的。随着时间的推移,其行为的明显复杂性在很大程度上反映了其所处环境的复杂性。(64–65;斜体为原文。)
An ant [A man] viewed as a behaving system, is quite simple. The apparent complexity of its [his] behavior over time is largely a reflection of the complexity of the environment in which it [he] finds itself [himself]. (64–65; italics in originals.)
我们可以以两种截然不同的方式来理解这个教训。一方面,人们可能会松一口气:将人理解为行为系统将比我们想象的要容易,因为他们行为中明显的复杂性很大程度上是由他们之外的因素造成的,因此也是我们问题之外的因素。另一方面,人们可能会认为问题本身已经转变了:由于观察到的行为中的相关复杂性取决于行为系统本身以外的很多因素,因此研究不能仅限于该系统,而必须扩展到某个更大的结构,而它只是其中的一小部分。
This lesson can be taken in two rather different ways. On the one hand, one might heave a sigh of scientific relief: understanding people as behaving systems is going to be easier than we thought, because so much of the apparent complexity in their behavior is due to factors external to them, and hence external to our problem. On the other hand, one might see the problem itself as transformed: since the relevant complexity in the observed behavior depends on so much more than the behaving system itself, the investigation cannot be restricted to that system alone, but must extend to some larger structure of which it is only a fraction.
西蒙本人就是以第一种方式接受教训的,这一点从他立即提出的两种“限制性措施”就可以看出来——这两种限制性措施都给我留下了深刻的印象(但最终都是站不住脚的)。
That Simon himself took the lesson in the first way is evident from the two “hedges” that he immediately offers—both of which strike me as quite remarkable (and ultimately untenable).
现在我想稍微规避一下风险。我不想考虑“完整的人”,即完全具备腺体和内脏的人,而是想将讨论限制在智人,即“有思想的人”身上。……我还想以第二种方式规避一下风险,因为人类可以在记忆中储存大量信息,这些信息可以通过适当的刺激来唤起。因此,我想将这种信息丰富的记忆视为有机体的一部分,而不是有机体的一部分,而更多的是它所适应的环境的一部分。(65;第二版中只有第二种规避。)
Now I should like to hedge my bets a little. Instead of trying to consider the “whole man,” fully equipped with glands and viscera, I should like to limit the discussion to Homo sapiens, “thinking man.”…I should also like to hedge my bets in a second way, for a human being can store away in memory a great furniture of information that can be evoked by appropriate stimuli. Hence I would like to view this information-packed memory less as part of the organism than as part of the environment to which it adapts. (65; second hedge only in the second edition.)
有了这些条件,西蒙就可以放心地将注意力从沙丘和鹅卵石上移开,也可以从人类的知识、文化、身体和世界移开,专注于密码和无意义的音节——所有这些都支持他的观点,即人类的“信息处理系统”(本质上是一个被美化的 CPU)必须是连续的,而且相当简单。实际上,他想削减足够多的真正的人类,以至于剩下的就像一只蚂蚁。
With these qualifications in place, Simon can safely turn his attention away not only from sand dunes and pebbles, but also from human knowledge, culture, the body, and the world, so as to concentrate on cryptarithmetic and nonsense syllables—all in support of his view that the human “information processing system” (essentially a glorified CPU) must be serial and rather simple. In effect, he wants to pare away enough of the real human being that what’s left is strikingly like an ant.
然而,人工智能研究的历史本身已经暗示了对这个寓言的另一种解读。也许,人工智能研究史上最大规模的趋势恰恰与将知识(“信息丰富的记忆”)归为“环境”的建议背道而驰——也就是说,将其视为理解智能问题的外部事物。基于语义网络、框架、内部模型、原型和“常识”的系统的基本要点是,除了在非常特殊的情况下,系统的智能性能更直接取决于其知识的特定互联性和组织性,而不是任何推理或处理能力。请注意,问题不在于信息的数量,而在于其特定的质量——其具体的结构。一切都在细节中——这些项目被分组在一起,从这里到那里有交叉引用,这个主题与那些键一起出现在那个索引中——这样,将知识的形式从其内容中抽象出来是没有意义的。换句话说,按照人工智能的这一趋势,研究(或构建)智能系统首先是研究(或构建)大型的、具体的知识结构,而不是简单的处理器。
The alternative reading of the parable, however, is already suggested by the history of artificial intelligence research itself. Perhaps the largest-scale trend in this history has been precisely counter to the suggestion of relegating knowledge (“information-packed memory”) to the “environment”—that is, regarding it as external to the problem of understanding intelligence. The essential point of systems based on semantic nets, frames, internal models, prototypes, and “common sense” is that, except in very special circumstances, the intelligent performance of a system depends more directly on the particular interconnectedness and organization of its knowledge, than on any reasoning or processing power. Note that the issue is not the quantity of information, but its specific quality—its concrete structure. Everything is in the details—that these items are grouped together, that there is a cross-reference from here to there, that this topic appears in that index with those keys—in such a way that abstracting the form of the knowledge from its content would make no sense. In other words, according to this trend in AI, to study (or build) intelligent systems is, above all, to study (or build) large, concrete knowledge structures—not simple processors.
然而,直到最近,几乎所有这类研究都保留了这样的假设:相关的“信息家具”是作为复杂的符号结构实现的,这些符号结构在许多方面就像传统笛卡尔思维的内容一样。特别是,它们位于单个代理的内部,在种类上不同于任何生理或硬件。接下来的探索可以被看作是以更激进的方式尝试对西蒙寓言的第二种解读。如果智能行为的显著复杂性密切依赖于代理的化身和世俗处境的具体细节,那么也许应该将智能本身理解为首先具有某种比内部的、无形的“思维”(无论是人工的还是自然的)更全面的结构的特征。
Until recently, however, nearly all of this research has retained the assumption that the relevant “furniture of information” is implemented as complex symbol structures that are, in many respects, just like the contents of the traditional Cartesian mind. In particular, they are internal to the individual agent, and different in kind from any physiology or hardware. The explorations that follow can be seen as trying out the second reading of Simon’s parable in a more radical way. If the significant complexity of intelligent behavior depends intimately on the concrete details of the agent’s embodiment and worldly situation, then perhaps intelligence as such should be understood as characteristic, in the first instance, of some more comprehensive structure than an internal, disembodied “mind”, whether artificial or natural.
当然,人类智能是具体化和嵌入化的;没有人否认这一点。问题是这个事实对智能的本质有多重要。提出这个问题的一种方式是问我们是否可以在原则上将智力(或心智)与身体和/或世界分开。“分开”这次并不意味着隔离或移除。也就是说,我们可以承认整体论的论点,即如果没有身体和/或世界,心智就不可能存在,但仍然要问它是否可以被理解为必要的更大整体中一个独特且定义明确的子系统。但这需要简要讨论将系统按照非任意的路线划分为不同子系统的原则。
Of course, human intelligence is embodied and embedded; nobody denies that. The question is how important this fact is to the nature of intelligence. One way to put the question is to ask whether we can in principle partition off the intellect (or mind) from the body and/or the world. “Partition off” does not this time mean isolation or removal. That is, we can grant the holist thesis that mind would be impossible in the absence of body and/or world, and still ask whether it can be understood as a distinct and well-defined subsystem within the necessary larger whole. But this requires a brief discussion of the principles for dividing systems into distinct subsystems along nonarbitrary lines.
西蒙这本薄薄的书的深度和广度得到了充分证明,在后面题为“复杂性的架构”的一章中,他讨论了这个问题:大系统在什么原则基础上可以分解为子系统?为了回答这个问题,他邀请我们
It is a fine testimony to the depth and breadth of Simon’s slender volume that, in a later chapter, entitled “The Architecture of Complexity”, he addresses this very issue: On what principled basis are large systems decomposable into subsystems? And, to answer it, he invites us to
一方面,区分子系统之间的相互作用,另一方面,区分子系统内部的相互作用(即子系统各部分之间的相互作用)。不同层次的相互作用可能(而且通常会)具有不同的数量级。(209)
distinguish between the interactions among subsystems, on the one hand, and the interactions within subsystems—that is, among the parts of those subsystems—on the other. The interactions at the different levels may be, and often will be, of different orders of magnitude. (209)
他想表达什么意思?拿一台电视机和一块大理石作比较。我们倾向于认为,前者是高度系统化的,由许多嵌套相互作用的子系统组成,而后者则几乎毫无系统性。为什么?一种解释可能是,电视机由许多不同种类的材料组成,排列成复杂的形状和图案,而大理石包含的材料相对较少,而且几乎(尽管不是完全)同质。然而,这不可能是正确的答案,因为计算机微芯片(集成电路)肯定比腐烂的餐桌残羹剩饭更系统化,尽管前者包含的材料相对较少,而且几乎同质,而后者则多种多样、杂乱无章。
What is he getting at? Consider a television set in comparison to a block of marble. The former, we are inclined to say, is highly systematic, composed of many nested interacting subsystems, whereas the latter is hardly systematic at all. Why? One suggestion might be that the TV is composed of many different kinds of material, arranged in complicated shapes and patterns, whereas the marble contains relatively few materials and is nearly (though not quite) homogeneous. This cannot be the right answer, however, because a computer microchip (an integrated circuit) is surely more systematic than a compost of rotting table scraps, even though the former contains relatively few materials and is nearly homogeneous, whereas the latter is diverse and messy.
相反,差异必须在于整体中不连续性的性质,以及它们之间相互作用的特征。要了解这一点,请想一想电视机的结构。如果我们假设,在某种分析层面上,电视机由一千个组件组成,那么我们可以问这些组件是如何区分的。一种可能的分解是整齐的几何:假设该集合大致呈立方体,沿每个轴将其分成十个相等的部分,得到一千个较小的立方体,整个集合恰好由这些小立方体组成。这种“分解”有什么问题?好吧,考虑其中一个“组件”立方体——比如靠近中心的一个。它包含半个晶体管、三分之二的电容器、几段电线、一块来自显像管的小三角形玻璃和大量热空气。显然,这是一个毫无意义的杂乱无章的混乱——尽管一千个同样疯狂的“零件”完全正确地组合在一起,可以组成一台电视机。我们的任务是说明原因。
Rather, the difference must lie in the nature of the discontinuities within the whole, and the character of the interactions across them. To see this, think of how the TV is organized. If we suppose that, at some level of analysis, it consists of a thousand components, then we can ask how these components are distinguished. One possible decomposition would be neatly geometrical: assuming the set is roughly cubical, divide it into ten equal slices along each axis, to yield a thousand smaller cubes, of which the entire set exactly consists. What’s wrong with this “decomposition”? Well, consider one of the “component” cubes—say, one near the center. It contains half of a transistor, two thirds of a capacitor, several fragments of wire, a small triangle of glass from the picture tube, and a lot of hot air. Obviously, this is an incoherent jumble that makes no sense—even though a thousand equally crazy “pieces”, put together exactly right, would make up a TV set. Our task is to say why.
电阻器是一种典型的电子元件。它有两根导线,顾名思义,它的唯一“作用”就是(在一定程度上)阻止两根导线之间的电流流动。它如何完成这项工作并不重要,在一定范围内,其他任何有关它的事情也无关紧要,只要它能正确可靠地完成这项工作,并且不干扰任何其他元件即可。电子元件,如电阻器,是较大电子电路中相对独立和自足的部分。这意味着几件事。首先,这意味着电阻器除了通过其电路连接(即那两根导线)与系统的其他部分相互作用。也就是说,除了影响这些连接中的电流外,它外部发生的任何事情都不会影响内部发生的任何事情,反之亦然。(更准确地说,除这些之外的所有影响都可以忽略不计,因为它们非常轻微或无关紧要。)其次,这意味着通过这些连接的相关交互本身是明确定义、可靠且相对简单的。例如,它只是电子流,而不是化学物质、传染病或违禁品的流动。最后,这意味着它本身并不是具有同等独立性和自足性的组件的复合体:电阻器加上电容器加起来不是一个独立的组件。(但是,电阻器、电容器和晶体管的适当较大排列可能会加起来成为前置放大器——而前置放大器又可以成为更高级别系统的组件。)
A resistor is a quintessential electronic component. It has two wires coming out of it, and its only “job”, as the name suggests, is to resist (to some specified degree) the flow of electricity between them. It doesn’t matter how it does that job—nor, within limits, does anything else about it matter—just so long as it does that job properly and reliably, and doesn’t interfere with any other components. An electronic component, like a resistor, is a relatively independent and self-contained portion of a larger electronic circuit. This means several things. In the first place, it means that the resistor does not interact with the rest of the system except through its circuit connections—namely, those two wires. That is, nothing that happens outside of it affects anything that happens inside, or vice versa, except by affecting the currents in those connections. (To be more precise, all effects other than these are negligible, either because they are so slight or because they are irrelevant.) Second, it means that the relevant interactions through those connections are themselves well-defined, reliable, and relatively simple. For instance, it’s only a flow of electrons, not of chemicals, contagion, or contraband. Finally, it means that it is not itself a composite of components at a comparable level of independence and self-containedness: a resistor plus a capacitor do not add up to a distinct component. (However, a suitable larger arrangement of resistors, capacitors, and transistors might add up to a pre-amp—which could in turn be a component in a higher-level system.)
电子元件的连接线构成了它与系统其余部分的“接口”。在谨慎使用时,应将元件、系统和接口的概念放在一起并相互理解。元件是系统中相对独立和自足的部分,因为它仅通过它们之间的接口与其他元件进行相关交互(并且不包含同一级别的内部接口)。接口是元件之间交互“接触”的点,因此相关交互定义明确、可靠且相对简单。系统是在接口处交互的元件的相对独立和自足的复合体。因此,上面提到的前置放大器既是元件又是系统;这样的元件系统通常称为子系统。尽管这些概念都是相互定义的,但它们并不是循环或空洞的,因为它们共同涉及相对独立、简单、相关性和交互的进一步概念。
An electronic component’s connecting wires constitute its “interface” to the rest of the system. In careful usage, the notions of component, system, and interface should all be understood together and in terms of one another. A component is a relatively independent and selfcontained portion of a system in the sense that it relevantly interacts with other components only through interfaces between them (and contains no internal interfaces at the same level). An interface is a point of interactive “contact” between components such that the relevant interactions are well-defined, reliable, and relatively simple. A system is a relatively independent and self-contained composite of components interacting at interfaces. So the pre-amp mentioned above would be both a component and a system; such a component system is often called a subsystem. Though these concepts are all defined in terms of one another, they are not therefore circular or empty, because they collectively involve the further notions of relative independence, simplicity, relevance, and interaction.
一个重要的结果是,真正的组件原则上可以用功能等效物替换。例如,电子电路中的电阻器可以用具有相同电阻值(并且可能满足其他一些规格)的任何其他电阻器替换。整个电路将继续像以前一样工作,因为该电阻器首先关心的就是其通过简单接口的电阻值——并且根据规定,替换件要匹配该电阻值。(当然,在 Eli Whitney 发明可互换零件之前,步枪就有零件。但这基本上是程度上的差异——具体来说,是接口定义得多么明确以及零件互换得多么容易。)
An important consequence is that genuine components are in principle replaceable by functional equivalents. For instance, a resistor in an electronic circuit can be replaced by any other that has the same resistance value (and perhaps meets a few other specifications). The circuit as a whole will continue to function as before, because all that mattered about that resistor in the first place was its resistance via a simple interface—and, by stipulation, the replacement matches that. (Of course, rifles had components before Eli Whitney invented interchangeable parts. But that was basically a difference of degree—specifically, of how well-defined the interfaces were and how readily interchangeable the parts were.)
现在回到组成电视机的上千个小立方体。从严格意义上来说,它们是组件吗?显然不是,因为它们甚至不是“相对”独立和自足的。或者,同样,将它们分开的表面不是适当的接口。也就是说,要使电视机正常工作,这些表面上所需的交互非常复杂和不规则,无法明确定义或可靠实现。想象一下,试图用一千个这样的立方体组装一台新电视机,而这些立方体都是从一千台其他电视机中一个一个取出的!然而,只要在真正的接口处进行划分,就可以很容易地用从其他电视机上取下的零件组装一台电视机。
Return now to the thousand little cubes making up a TV set. Are they components, in the strict sense? Obviously not, because they are not even “relatively” independent and self-contained. Or, what comes to the same thing, the surfaces separating them are not proper interfaces. That is, the interactions required across those surfaces, for the set to work, are absurdly complex and irregular, with no hope of clear definition or reliable implementation. Just imagine trying to assemble a new set from a thousand such cubes taken one each from a thousand other sets! Yet a TV can easily be made out of parts taken from others, if only the divisions are made at genuine interfaces.
如果我们考虑一个移动的系统,比如发动机或动物,这一点就更加生动了:那么这些立方体就不会包含随时间变化的相同物理块,一致的“界面”甚至无法想象。然而,如果允许它们的边界随之移动,将此类系统分解成更简单的组件是完全标准和简单的。例如,活塞和曲轴之间的连杆相对独立,在其两个轴承表面具有非常明确的界面;而它们唯一重要的是它们的轴线保持可靠平行并保持一定距离。同样,这样的组件由其简单、可靠的接口界定。
The point is even more vivid if we consider instead a system that moves, such as an engine or an animal: then the cubes wouldn’t so much as contain the same physical hunks through time, and a consistent “interface” wouldn’t even be conceivable. Yet decomposition of such systems into simpler components is perfectly standard and straightforward, if their boundaries are allowed to move with them. For instance, a connecting rod between a piston and crankshaft is relatively self-contained, and has very well-defined interfaces in its two bearing surfaces; and all that matters about them is that their axes be kept reliably parallel and a certain distance apart. Again, the component as such is delimited by its simple, reliable interfaces.
电子电路、机械装置甚至生物体等例子都给人留下这样的印象:组件和子系统的边界是由物质的不连续性设定的——除了在界面处允许狭义接触外,几乎完全是这样的。为了我们的目的,反驳这种印象尤为重要,因为我们的身体和世界之间的物质不连续性——正是这种不连续性决定了这些身体是身体——误导性地增强了身体表面作为理解其他现象(如智能)的相关界面的明显重要性。
Examples like electronic circuits, mechanisms, and even organisms can leave the impression that the boundaries of components and subsystems are set by corporeal discontinuities—virtually complete, except at the interfaces, where narrowly defined contacts are permitted. It is particularly important for our purposes to counter this impression, since the corporeal discontinuity between our bodies and the world—the very discontinuity that determines these bodies as bodies—misleadingly enhances the apparent significance of bodily surfaces as relevant interfaces for the understanding of other phenomena, such as intelligence.
系统界面不必与实体表面相一致,这一点可以通过例子来说明。大型组织,如政府、公司和大学,几乎总是细分为不同的部门、部门和单位。但这些分界线与实体边界之间的对应关系充其量只是偶然的,而且从来都不是必要的。事实上,随着越来越多的业务通过全球通信网络进行,人员和数据的物理位置实际上变得无关紧要。重要的是访问代码、权限级别、分发列表、私人地址、优先级顺序等,它们决定了信息流向何处以及关注什么。最终,这些结构决定了部门化和层级结构。
That systematic interfaces need not coincide with corporeal surfaces can be shown by example. Large organizations, like governments, corporations, and universities, are almost always subdivided into various divisions, departments, and units. But the correspondence between these demarcations and corporeal boundaries is at best haphazard, and never essential. Indeed, as more and more business is conducted via worldwide communication networks, the physical locations of personnel and data become practically irrelevant. What matter instead are the access codes, permission levels, distribution lists, private addresses, priority orderings, and so on, that determine where information flows and what gets attended to. It is the structure of these, ultimately, that determines departmentalization and hierarchy.
同一部门或单位的成员往往比不同部门的成员更紧密地合作,共享资源和关注点。同样,同一部门的单位比不同部门的单位互动更频繁、更密切;等等。最终,没有什么取决于谁在哪栋大楼里,或者在哪个大陆——正如西蒙本人所意识到的那样。
Members of a single department or unit tend to work more closely together, sharing resources and concerns, than do members of different departments. Likewise, units of the same division interact more often and more intimately than do units of different divisions; and so on. Nothing depends, ultimately, on who is in what building, or on what continent—as Simon himself clearly appreciated.
大多数物理和生物层次结构都是用空间术语来描述的。我们检测细胞中的细胞器就像检测蛋糕中的葡萄干一样——它们是“明显”区分的子结构,在更大的结构中空间定位。另一方面,我们建议通过观察谁与谁互动而不是观察谁与谁住得近来识别社会层次结构。这两种观点可以通过用互动强度来定义层次结构来调和,但观察到在大多数生物和物理系统中,相对强烈的互动意味着相对的空间接近。(199)
Most physical and biological hierarchies are described in spatial terms. We detect the organelles in a cell in the way we detect the raisins in a cake—they are “visibly” differentiated substructures localized spatially in the larger structure. On the other hand, we propose to identify social hierarchies not by observing who lives close to whom but by observing who interacts with whom. These two points of view can be reconciled by defining hierarchy in terms of intensity of interaction, but observing that in most biological and physical systems relatively intense interaction implies relative spatial propinquity. (199)
此处相互作用的“强度”指的是事物耦合的“紧密”程度,甚至是“紧密结合”的程度,即每个事物的行为对其他事物的影响或制约程度。从这个意义上讲,它可以作为一个通用概念,涵盖连杆的机械完整性、电阻器的电气完整性,以及群体的社会或机构凝聚力。例如,连杆的不同部分相互作用非常紧密,以至于它们总是刚性地一起移动;相比之下,连杆与活塞和曲轴的相互作用“较松散”,允许绕公共轴独立旋转。此外,比较这些强度可以分别作为解释相对独立性和界面简单性的第一步;因为与内部相比,每个方面都可以看作是外部相互作用强度较低(耦合较松散)的问题。
“Intensity” of interaction here means something like how “tightly” things are coupled, or even how “close-knit” they are—that is, the degree to which the behavior of each affects or constrains the other. Heard in this way, it can serve as a generic notion encompassing the mechanical integrity of a connecting rod and the electrical unity of a resistor, as well as the social or institutional cohesiveness of a group. The different parts of a connecting rod interact so intensely, for example, that they always move rigidly together; by comparison, its interactions with the piston and the crankshaft are “looser,” allowing independent rotation about a common axis. Further, comparing these intensities can be the first step in accounts, respectively, of relative independence and interface simplicity; for each can be seen as a matter of less intense interaction—looser coupling—externally, as compared to internally.
然而,Simon 只是顺便提到了以这种方式处理系统的动机,而没有强调这一点,这对于我们的目的来说并不需要。在题为“近可分解性和可理解性”的简短小节中,他写道:
What Simon mentions only in passing, however, without the emphasis it will need for our purposes, is the motivation for treating systems in this way. In a brief subsection entitled “Near Decomposability and Comprehensibility”, he writes:
许多复杂系统具有几乎可分解的层次结构,这一事实是使我们能够理解、描述甚至“看到”此类系统及其各部分的主要促进因素。(218–219)
The fact that many complex systems have a nearly decomposable, hierarchic structure is a major facilitating factor enabling us to understand, describe, and even ‘see’ such systems and their parts. (218–219)
“近可分解系统”是他用来描述由相对独立的、相互作用的组件组成的系统,这些组件之间具有简单的接口。因此,关于可理解性的观点可以解释如下:在复杂且难以理解的事物中找到一组简单可靠的接口,将其划分为相对独立的组件,是一种使其易于理解的方法。
‘Nearly decomposable systems’ is his term for systems of relatively independent interacting components with simple interfaces between them. So the point about comprehensibility can be paraphrased as follows: finding, in something complicated and hard to understand, a set of simple reliable interfaces, dividing it into relatively independent components, is a way of rendering it intelligible.
可以从另一个角度来理解这一点的重要性。生物和电子系统在某种意义上也是物理的——事实上,社会和(其他)信息系统也是如此。但在那种情况下,在某种意义上相同的“东西”中似乎可能存在多种相互作用,因此也可能存在多种分解。那么,我们如何决定哪些相互作用和分解是重要的呢?然而,一旦提出这个问题,答案就很明显了:这取决于我们感兴趣的是什么——也就是说,这取决于我们试图理解什么现象。因此,当我们转向心身世界“系统”并想知道如何分解它时,我们的考虑必然与对需要理解的内容的某种先前识别有关。
The significance of this can be brought out by approaching it from another side. Biological and electronic systems are also in some sense physical—as, indeed, are social and (other) information systems. But in that case, more than one kind of interaction, and accordingly more than one kind of decomposition, would seem to be possible in what is somehow the same “stuff”. How, then, are we to decide which interactions and decompositions are the important ones? Once this question is asked, however, the answer is obvious: it depends on what we’re interested in—which is to say, it depends on what phenomena we are trying to understand. Thus, when we turn to the mind-body-world “system”, and wonder how, perhaps, to decompose it, our considerations will perforce be relative to some prior identification of what is to be understood.
例如,在西蒙的寓言中,海滩的结构与蚂蚁的结构同样重要,部分原因是蚂蚁的实际路径是由蚂蚁与海滩表面的具体细节之间的密切互动实时决定的。相反,如果蚂蚁对海滩的内部模型或表示做出反应,而不是对海滩本身做出反应,或者它只包含它盲目跟随的步骤和转弯列表,那么海滩的重要性就会降低或消失。然而,另一部分原因,在某种意义上也是前一部分,是我们首先想要了解的是蚂蚁的路径。如果我们对它的呼吸或免疫系统感兴趣,那么海滩就基本上无关紧要了,无论蚂蚁在行走时与海滩的联系有多紧密。换句话说,在考虑系统的范围和结构时,哪些密切的相互作用很重要,从根本上取决于我们感兴趣的是什么——也就是我们试图理解什么。
Part of the reason, for instance, that the structure of the beach is as important as the structure of the ant in Simon’s parable is that the ant’s actual path is determined in real time by close interaction between the ant and the concrete details of the beach’s surface. If, by contrast, the ant were responding to an internal model or representation of the beach, instead of responding the beach itself, or if it just contained a list of steps and turns which it followed slavishly, then the importance of the beach would be reduced or eliminated. The other part of the reason, however, and in some sense the prior part, is that what we want to understand in the first place is the ant’s path. If we were interested instead in it’s respiration or its immune system, then the beach would be largely irrelevant, regardless of how tightly the ant is coupled to it when walking. In other words, which close interactions matter, when considering the scope and structure of systems, depends fundamentally on what we’re interested in—that is, what we’re trying to understand.
那么,这就是分解理论发挥作用的地方。如果我们感兴趣的是路径,并且如果蚂蚁主要依靠自己的内部结构来引导脚步,仅依靠地面的摩擦力和支撑力,那么蚂蚁和海滩就是两个相对独立的组成部分或系统,它们的脚底有一个明确的简单界面。另一方面,如果蚂蚁和海滩表面的细节之间存在持续的紧密耦合,并且这种耦合对于确定实际路径至关重要,那么,为了理解这条路径,蚂蚁和海滩必须被视为一个整体,而不是一对不同的组成部分。这是我所说的亲密关系最简单的原型。
Here, then, is where the account of decomposition takes hold. If what we’re interested in is the path, and if the ant relies mostly on its own internal structure to guide its steps, counting on the ground just for friction and support, then the ant and beach are two relatively independent components or systems, with a well-defined simple interface at the soles of its feet. If, on the other hand, there is constant close coupling between the ant and the details of the beach surface, and if this coupling is crucial in determining the actual path, then, for purposes of understanding that path, the ant and beach must be regarded more as an integrated unit than as a pair of distinct components. This is the simplest archetype of what I mean by intimacy.
昆虫例子具有简单原型的所有优点和缺点。当然,它们就其本身而言非常清晰;但它们并没有走得太远。我们将尝试通过一系列中介来逐步找到更有趣的案例。西蒙在某一处(64)沉思道,如果有人要制造类似昆虫的机器人,它们在海滩上的行为将与他的蚂蚁非常相似。他几乎不知道,二十年后,这样一个项目将在麻省理工学院罗德布鲁克斯的实验室中如火如荼地进行。也许有点讽刺的是,布鲁克斯努力的部分原因是对西蒙和艾伦纽厄尔在 20 世纪 50 年代开创的人工智能符号操作方法的不满。然而,正如我们将看到的,这并非偶然。
Insect examples have all the advantages and disadvantages of simple archetypes. They are, of course, wonderfully clear, as far as they go; but they don’t go very far. We will try to work our way up toward more interesting cases via a series of intermediaries. Simon muses at one point (64) that if someone were to build insect-like robots, their behavior on a beach would be much like that of his ant. Little did he know that, two decades later, such a project would be in full swing in the laboratory of Rod Brooks at MIT. It is perhaps slightly ironic that part of what drives Brooks’s efforts is a dissatisfaction with the symbol-manipulation approach to artificial intelligence pioneered by Simon and Allen Newell in the 1950’s. As we shall see, however, this is no accident.
布鲁克斯最著名的“生物”名叫赫伯特(以西蒙命名),是一种自供电的轮式装置,大小与小型垃圾桶相当,装有各种传感器,顶部有一只可移动的手臂,计算能力却出奇地弱。它的主要任务是在麻省理工学院人工智能实验室里四处寻找空的汽水罐,把它们捡起来,然后放回中央垃圾箱。赫伯特于 1990 年左右制造出来,并且确实能工作(尽管有点笨拙)。与 20 世纪 70 年代的机器人相比,赫伯特工作的实验室和办公室并没有经过任何特别的准备:地板或墙壁上没有画上任何指导方针,现实工作空间的典型混乱和杂乱没有被清理干净,人们照常做自己的事,等等。因此,赫伯特设法在一个相对不适宜居住、不断变化的环境中工作,完成自己的工作,并且几乎不惹麻烦——其熟练程度大致与甲壳类动物相当。
Brooks’s best known “creature,” named Herbert (after Simon), is a self-powered, wheeled contraption, about the size of a small trash can, with various sensors, one moveable arm on top, and surprisingly little compute power. It’s lot in life is to buzz around the MIT AI Lab looking for empty soda pop cans, pick them up, and return them to a central bin. Herbert was built (around 1990) and actually worked (albeit clumsily). What makes this noteworthy, compared, say, to robots of the 1970’s, is that the labs and offices in which Herbert worked were in no way specially prepared: there were no guidelines painted on the floors or walls, the typical mess and clutter of real work space were not cleaned up, people carried on with their own business as usual, and so on. So Herbert managed to negotiate a relatively inhospitable, changing environment, do its job, and stay mostly out of trouble—with roughly the proficiency of a crustacean.
对我们来说重要的不是这一小小的成功,而是赫伯特所依据的设计原则。有两点值得强调。首先,布鲁克斯使用了他所谓的“包容架构”,根据该架构,系统不是按照熟悉的方式按局部功能或能力分解,而是按全局活动或任务分解。
What matters for us is not this modest success but the design principles on which Herbert is based. Two points deserve emphasis. First, Brooks uses what he calls the “subsumption architecture”, according to which systems are decomposed not in the familiar way by local functions or faculties, but rather by global activities or tasks.
[这种]另类分解不区分外围系统(如视觉)和中央系统。相反,智能系统的基本划分是在正交方向上将其划分为活动产生子系统。每个活动或行为产生系统都单独将感知与行动联系起来。我们将活动产生系统称为层。活动是与世界互动的模式。我们活动的另一个名称可能是技能……(1991,146;另见本卷第 23 章,第 472 页)
[This] alternative decomposition makes no distinction between peripheral systems, such as vision, and central systems. Rather the fundamental slicing up of an intelligent system is in the orthogonal direction dividing it into activity producing subsystems. Each activity or behavior producing system individually connects sensing to action. We refer to an activity producing system as a layer. An activity is a pattern of interactions with the world. Another name for our activities might well be skill…(1991, 146; see also Chapter 23 of this volume, p. 472)
因此,Herbert 有一个子系统用于检测和避开路径上的障碍物,另一个子系统用于四处游荡,第三个子系统用于寻找远处的汽水罐并追踪它们,第四个子系统用于注意附近的汽水罐并用手握住它们,第五个子系统用于检测手指之间有东西并将其闭合,等等……总共十四个。引人注目的是,这些都是完整的输入/输出系统,或多或少彼此独立。当然,它们不可能完全独立,因为例如 Herbert 只有一组轮子;因此,如果两个不同的子系统同时负责移动机器人,那么必须有一个子系统通过某种接口起主导作用。但大部分交互(最紧密的耦合)都在各自的活动层内——正如 Brooks 明确指出的那样,包括与世界的交互。
Thus, Herbert has one subsystem for detecting and avoiding obstacles in its path, another for wandering around, a third for finding distant soda cans and homing in on them, a fourth for noticing nearby soda cans and putting its hand around them, a fifth for detecting something between its fingers and closing them, and so on…fourteen in all. What’s striking is that these are all complete input/output systems, more or less independent of each other. They can’t be entirely independent, of course, because, for instance, Herbert has only one set of wheels; so if two different subsystems undertake to move the robot at the same moment, one must dominate—through some interface. But the bulk of the interactions, the tightest couplings, are within the respective activity layers—including, as Brooks explicitly points out, interactions with the world.
换句话说,该系统的各种活动都像西蒙蚂蚁的行走一样。每一种活动都涉及与环境的某些特定方面不断密切互动,并且只能通过这些方面来理解。因此,在使不同的活动变得可理解方面,环境各个方面的结构至少与相应层的内部部分的结构一样重要。赫伯特有十四个相对独立、紧密结合的子系统,每个子系统都包含机器人内部的结构和外部的结构。最后再说一句,这也预示了我们的走向,赫伯特的每个最高级别的子系统都有些精神,有些身体,有些世俗。也就是说,根据西蒙的互动强度原则,主要划分不是分为思想、身体和世界,而是分为以各种方式贯穿这些的“层”。特别是,机器人的外表面不是主要界面。(当然,各个层内可能还有进一步的细分,这些细分可能会或可能不会将这个表面的部分作为辅助界面。)
In other words, each of the various activities of this system is like the walking of Simon’s ant. Each involves constant close interaction with certain specific aspects of the environment, and can only be understood in terms thereof. So the structures of the respective aspects of the environment are at least as important as the structures of the internal portions of the corresponding layers in rendering the different activities intelligible. Herbert has fourteen relatively independent, closely knit subsystems, each encompassing both structures within the robot and structures outside of it. To put it one last way, one that foreshadows where we’re going, each of Herbert’s highest-level subsystems is somewhat mental, somewhat bodily, and somewhat worldly. That is, according to Simon’s principles of intensity of interaction, the primary division is not into mind, body, and world, but rather into “layers” that cut across these in various ways. And, in particular, the outer surface of the robot is not a primary interface. (Of course, there may be further subdivisions within the respective layers, which may or may not take portions of this surface as subsidiary interfaces.)
第二点值得强调,与第一点密切相关,布鲁克斯提出的口号中就体现了这一点:世界是它自己的最佳模型。(1990,5)这恰恰否定了我之前提到的西蒙蚂蚁的替代方案:即蚂蚁不应包含任何内部模型或海滩表示,也不应包含内部步骤和转弯指令列表。这些替代方案将用生物体内部的复杂性取代生物体与其环境之间相互作用的强度。但布鲁克斯,正如他的口号所示,非常反对这种做法。为什么?
The second point that deserves emphasis, closely related to the first, is captured in a slogan that Brooks proposes: The world is its own best model. (1990, 5) This is precisely to repudiate designs like the alternatives I mentioned earlier for Simon’s ant: that is, the ant should not contain any inner model or representation of the beach, nor an inner list of step and turn instructions. These alternatives would substitute complexity within the organism for intensity of interaction between the organism and its environment. But Brooks, as his slogan indicates, is very much against that. Why?
我们可以将答案放在布鲁克斯可能喜欢的另一条口号中:感知廉价,表示昂贵。这样的口号可能会让许多人工智能工作者感到惊讶,他们敏锐地意识到模式识别有多么困难。但关键在于足够好的感知比足够好的表示廉价——这里的“足够好”是指可以避免严重错误。表示的问题在于,要足够好,它必须相对完整和相对最新,而这两者在动态环境中都是昂贵的。相比之下,感知可以愉快地保持临时性,只在具体问题出现时处理它们。举一个简单的例子,对于大多数目的而言,试图跟踪冰箱里所有东西目前在哪个架子上是愚蠢的;如果你想要什么东西,只要看一看就可以了。
We can put the answer in another slogan that Brooks would probably like: Perception is cheap, representation expensive. Such a slogan might surprise many AI workers, who are acutely aware of how difficult pattern recognition can be. But the point is that good enough perception is cheaper than good enough representation—where that means “good enough” to avoid serious errors. The trouble with representation is that, to be good enough, it must be relatively complete and relatively up to date, both of which are costly in a dynamic environment. Perception, by contrast, can remain happily ad hoc, dealing with concrete questions only as they arise. To take a homely example, it would be silly, for most purposes, to try to keep track of what shelf everything in the refrigerator is currently on; if and when you want something, just look.
为什么有人会认为不是这样呢?答案在于两个深层假设,这两个假设从一开始就影响了符号人工智能——包括 Newell 和 Simon 的工作。第一个假设是,智能最好地体现在解决问题上,尤其是解决问题。第二个假设是,问题最好通过推理过程来解决:从问题陈述和解决方案陈述中可用的辅助知识开始。该领域历史上最伟大的胜利是证明了这些理想可以机械地实现——主要是通过形式推理、启发式引导搜索和结构化知识表示。但请注意,它们是如何偏向于表示而远离感知的。不仅必须表示问题和相关知识(通常以某种符号形式表示),还必须表示在寻求解决方案的过程中探索的所有中间状态。所有这些表示的生成、使用和管理就成为首要关注的问题。
Why would anyone have supposed otherwise? The answer lies in two deep assumptions that have informed symbolic AI from the beginning—including the work of Newell and Simon. The first is that intelligence is best manifested in solving problems, especially hard problems. The second is that problems are best solved by a process of reasoning: working from a statement of the problem and such ancillary knowledge as is available to a statement of the solution. The greatest triumphs in the history of the field are its demonstrations that these ideals can be realized mechanically—principally via formal inference, heuristically guided search, and structured knowledge representation. But notice how they bias the orientation toward representation and away from perception. Not only must the problem and the relevant knowledge be represented, generally in some symbolic formalism, but so also all the intermediate states that are explored on the way to the solution. The generation, use, and management of all these representations then becomes the paramount concern.
在这种制度下,感知被简化为一个外围通道,通过该通道最初提出问题并提供附带事实;它就像一台电传打字机。事实上,在大多数人工智能中,感知过程被认为是传导:一些特殊的预处理器接收光学或听觉输入,“识别”它,然后为主系统生成符号输出——与系统从电传打字机中得到的输出没有什么不同。这里关于电传打字机的相关性在于它是一种窄带宽设备——与紧密耦合完全相反。
Perception, under such a regime, is reduced to a peripheral channel through which the problem is initially posed, and incidental facts are supplied; it might as well be a teletype. In most of AI, in fact, the process of perception has been conceived as transduction: some special preprocessor takes optical or auditory input, “recognizes” it, and then produces a symbolic output for the main system—not unlike what the system could have gotten from a teletype. What’s pertinent about a teletype here is that it’s a narrow-bandwidth device—the very antithesis of tight coupling.
例如,与典型的符号描述相比,输入感知系统的“信息位”数量是巨大的。因此,如果一个“视觉传感器”对一只熟睡的棕色狗做出“看,一只熟睡的棕色狗”这样的表情反应,那么数据量就会大大减少。这通常被认为是一种好处,因为如果没有这样的减少,符号系统就会不堪重负。但这也严重限制了系统与周围环境的密切联系。如果生物体的感知系统不受此类瓶颈的束缚,那么它们在敏感性和响应性方面将具有显著优势。然而,根据西蒙的系统分析标准,另一种“宽带宽”耦合恰恰会破坏或降低生物体/环境边界作为重要分解界面的地位——正如布鲁克斯所建议的那样。
For instance, the number of “bits” of information in the input to a perceptual system is enormous compared to the number in a typical symbolic description. So a “visual transducer” that responds to a sleeping brown dog with some expression like “Lo, a sleeping brown dog” has effected a huge data reduction. And this is usually regarded as a benefit, because, without such a reduction, a symbolic system would be overwhelmed. But it is also a serious bottleneck in the system’s ability to be in close touch with its environment. Organisms with perceptual systems not encumbered by such bottlenecks could have significant advantages in sensitivity and responsiveness. The alternative “wide bandwidth” coupling, however, is precisely what, by Simon’s systems-analytic criterion, would undermine or downgrade the organism/environment boundary as an important decompositional interface—just as Brooks proposes.
强调内部符号表征的模型暗示了对行动理解的瓶颈,就像它们对感知的瓶颈一样,原因也大致相同。理性问题解决过程的产物是低带宽的符号结构序列,它可以报告结果(例如,在电传打字机上)或向“输出传感器”发送指令。但是,熟练或熟练地应对情况,就像对情况的响应一样,可能从宽带耦合中获益。(我们将在下一节中回到这个问题。)
Models that emphasize internal symbolic representations insinuate bottlenecks into the understanding of action, just as they do for perception, and for much the same reason. The product of a rational problem-solving process is a low-bandwidth sequence of symbol structures, which can either report the results (say, on a teletype) or send instructions to an “output transducer”. But skillful or adept engagement with a situation is as likely to profit from a wide-band-width coupling as is responsiveness to it. (We will return to this following the next section.)
与此同时,请注意,感知和行动之间的区别本身被人为地强调和强化,因为中央处理器或大脑在它们之间工作,从一个接收“输入”,然后(稍后)向另一个发送“输出”。主要的例子是交互,它同时具有感知性和主动性,实时地丰富集成。因此,我们的冰箱能力值得注意的不仅仅是,甚至主要不是,我们可以视觉识别里面有什么,而是我们可以轻松可靠地伸手到牛奶和烤豆旁边,取出橙汁——而不会洒出来。这种高带宽的手眼协调——或者更确切地说,手眼冰箱协调——是布鲁克斯所说的“活动”或“技能”(尽管比他的机器人先进得多)。几乎没有理由相信符号处理与它有很大关系——除非人们已经坚持认为推理必须是所有灵活能力的基础。
In the meantime, notice that the very distinction between perception and action is itself artificially emphasized and sharpened by the image of a central processor or mind working between them, receiving “input” from the one and then (later) sending “output” to the other. The primary instance is rather interaction, which is simultaneously perceptive and active, richly integrated in real time. Thus, what’s noteworthy about our refrigerator aptitudes is not just, or even mainly, that we can visually identify what’s there, but rather the fact that we can, easily and reliably, reach around the milk and over the baked beans to lift out the orange juice—without spilling any of them. This high-bandwidth hand-eye coordination—or, better, hand-eye-refrigerator coordination—is what Brooks calls an “activity” or “skill” (though much more advanced than his robots). There is little reason to believe that symbol processing has much to do with it—unless one is already committed to the view that reasoning must underlie all flexible competence.
心理学家詹姆斯·吉布森 (James J. Gibson) 在比昆虫高一些的水平上提出了几个相关观点。在《视觉感知的生态学方法》的开篇,他以“动物与环境的互惠关系”为题,解释了
The psychologist James J. Gibson makes several related points at a level somewhat higher than insects. Early in The Ecological Approach to Visual Perception, he begins a section entitled “The Mutuality of Animal and Environment” by explaining that
动物和环境这两个词是不可分割的一对。每个术语都暗示着另一个术语。没有环境,任何动物都无法生存。同样,虽然不那么明显,但环境意味着动物(或至少是生物)被包围。这意味着,在生命出现之前的数百万年里,地球表面严格来说不是一个环境。(1979,8)
the words animal and environment make an inseparable pair. Each term implies the other. No animal could exist without an environment surrounding it. Equally, although not so obvious, an environment implies an animal (or at least an organism) to be surrounded. This means that the surface of the earth, millions of years before life developed on it, was not an environment, properly speaking. (1979, 8)
这不是一种繁琐的语义争论,而是对可理解性水平和单位的微妙观察。我们只有将动物视为感知者,才能理解它们与环境密不可分,而环境本身可以用适合该动物的方式来理解。
This is not a fussy semantic quibble, but a subtle observation about levels and units of intelligibility. We can only understand animals as perceivers if we consider them as inseparably related to an environment, which is itself understood in terms appropriate to that animal.
吉布森“生态学方法”的核心是他对动物感知的内容及其方式的解释。他认为,如果从物理光学的角度出发,视觉感知是无法理解的。一个有视力的动物,首先不会对光的物理简单属性(如颜色和亮度)做出反应,而是对环境中对其重要的可见特征做出反应。吉布森将这些特征称为“可供性”。
Central to Gibson’s “ecological approach” is his account of what it is that an animal perceives and how. Visual perception cannot be understood, he maintains, if one starts from the perspective of physical optics. A system that sees—a sighted animal—is not responsive, in the first instance, to physically simple properties of light, like color and brightness, but rather to visible features of the environment that matter to it. Gibson calls such features “affordances”.
环境的承受力是指它为动物提供的东西,无论是好是坏。动词“承受力”可以在词典中找到,但名词“承受力”却找不到。这是我自己编造的。我的意思是,它既指环境,又指动物,而现有的任何术语都不能做到这一点。它意味着动物和环境的互补性。(127)
The affordances of the environment are what it offers the animal, what it provides or furnishes, for good or ill. The verb to afford is found in the dictionary, but the noun affordance is not. I have made it up. I mean by it something that refers to both the environment and the animal in a way that no existing term does. It implies the complementarity of the animal and the environment. (127)
例如,一个足够坚固和平坦的表面可以为某种动物提供站立或行走的地方——当然,不能为鱼提供,而为麻雀提供站立空间的东西不一定能为猫提供(这对两者都很重要)。角落可以提供庇护和隐居,绿叶或较小的邻居可以提供午餐,较大的邻居可以提供攻击,等等——这一切都取决于谁在看以及有什么兴趣。
So for example, a suitably sturdy and flat surface could afford a place to stand or walk to an animal of a certain sort—not to a fish, of course, and what affords standing room to a sparrow might not to a cat (a matter of some importance to both). Nooks can afford shelter and seclusion, green leaves or smaller neighbors can afford lunch, larger neighbors can afford attack, and so on—all depending on who’s looking and with what interests.
这里重要的(也是有争议的)不是可供性的概念本身,而是可供性可以被感知的说法——而不是推断。“可供性理论的核心问题不是它们是否存在,而是环境光中是否有信息可以感知它们。”(140)直观地看,令人吃惊的论点是这样的:环境光本身可以是一个特征,例如,那边的东西“看起来可以吃”或“看起来很危险”(从这里看,对我这样的生物来说)。这确实是一个非常复杂的特征,实际上不可能(在物理术语中)指定。吉布森将这些特征称为“高阶不变量”,并主要对它们提出了两点。首先:
What’s important (and controversial) here is not the idea of affordances as such, but the claim that they can be perceived—as opposed to inferred. “The central question for the theory of affordances is not whether they exist and are real but whether information is available in ambient light for perceiving them.” (140) Intuitively, the startling thesis is this: it can be a feature of the ambient light itself that, for instance, something over there “looks edible” or “looks dangerous” (from here, to a creature like me). This would have to be a very complicated feature indeed, practically impossible to specify (in physical terms). Gibson calls such features “high-order invariants”, and makes essentially two points about them. First:
高阶光学不变量指定高阶可供性的假设的问题在于,习惯于在实验室中使用低阶刺激变量的实验者无法想出测量它们的方法。(141)
The trouble with the assumption that high-order optical invariants specify high-order affordances is that experimenters, accustomed to working in the laboratory with low-order stimulus variables, cannot think of a way to measure them. (141)
换句话说,如果感知系统“拾取”高阶不变量,那么眼睛和其他感觉器官的表面就不能成为界面——因为相关的相互作用不明确且相对简单。
In other words, if perceptual systems “pick up” high-order invariants, then the surfaces of the eyes and other sense organs cannot be interfaces—because the relevant interactions are not well-defined and relatively simple.
第二点,我们已经看到 Gibson 强调过多次了,而且这正是我们所期望的由于缺乏明确定义的界面而导致的结果。
The second is a point that we have seen Gibson emphasize several times already, and moreover just what we would expect to follow from the lack of a well-defined interface.
环境光中的信息可以指定可供性,这一假设是生态光学的顶峰。不变量的概念一方面与观察者的动机和需求相关,另一方面与世界的物质和表面相关,这为心理学提供了一种新方法。(143)
The hypothesis of information in ambient light to specify affordances is the culmination of ecological optics. The notion of invariants that are related at one extreme to the motives and needs of an observer and at the other extreme to the substances and surfaces of a world provides a new approach to psychology. (143)
这不仅是为了重申动物与其环境的互补性,也是为了将这种整合与构成它们相互作用的“高阶”可供性和不变量联系起来。生物体本身感知能力的特定复杂性通过紧密耦合的高带宽交互维持了其感知的相应复杂性——在适合理解感知的描述水平上。因此,用我们的话来说,生态光学的顶峰就是感知者和被感知者的亲密关系。
This is not merely to reiterate the complementarity of the animal and its environment, but also to associate that integration with the “high order” affordances and invariants that constitute their interaction. Specific complexity in the perceptual capacities of the organism itself is what sustains the corresponding complexity in what it perceives, via tightly coupled, high-bandwidth interaction—at the level of description appropriate to understanding perception. Thus, the culmination of ecological optics is, in our terms, the intimacy of perceiver and perceived.
到目前为止,除了冰箱的例子,我们所有的讨论都涉及相对原始的生物或系统。这可能会给人留下这样的印象:心灵、身体和世界的紧密混合——特别是缺乏精神上的差异——主要是低等生命形式的特征,而不是人类。毕竟,笛卡尔认为动物根本没有思想,只是有形的。相比之下,推理的空间通常被视为人类的首要甚至专属;而推理似乎需要输入/输出接口和传感器来实现感知和行动。(记住,传导是笛卡尔赋予松果体的功能。)所以也许教训不是要全面破坏心灵/身体的分离,而是要将其限制在我们自己身上——理性的动物。
So far, with the exception of the refrigerator example, all our discussions have involved relatively primitive creatures or systems. This might give the impression that intimate intermingling of mind, body, and world—in particular, lack of mental distinctness—is characteristic mainly of lower life forms, as opposed to people. Descartes, after all, held that animals have no minds at all, and are merely physical. The space of reasons, by contrast, is often seen as pre-eminently or even exclusively human; and it is chiefly ratiocination that seems to require input/output interfaces and transducers to enable perception and action. (Transduction, remember, is the function that Descartes assigned to the pineal gland.) So maybe the lesson is not to undermine the mind/body separation across the board, but rather to restrict it to ourselves—the rational animals.
但相反,我想说的是,人类的心灵可能比任何其他东西都更紧密地与身体和世界交织在一起,这是人类的一大优势。此外,我认为西蒙的互动强度标准,在相关的可理解性水平上,将是使这一点显而易见的合适工具。让我们回到熟练行为和感知的现象,并更详细地考虑其结构。在《计算机不能做什么》第三部分题为“身体在智能行为中的作用”的章节中,休伯特·德雷福斯写道:
On the contrary, however, I want to suggest that the human mind may be more intimately intermingled with its body and its world than is any other, and that this is one of its distinctive advantages. Moreover, I think that Simon’s criterion of intensity of interaction, at the relevant level of intelligibility, will be just the right tool for making this visible. Let us return to the phenomenon of skillful behavior and perception, and consider its structure in more detail. In Part III of What Computers Can’t Do, in a chapter entitled “The Role of the Body in Intelligent Behavior”, Hubert L. Dreyfus writes:
一般来说,在获得一项技能时——例如,学习开车、跳舞或发音外语——一开始我们必须缓慢、笨拙且有意识地遵守规则。但后来到了最后,我们终于可以自动执行了。在这一点上,我们似乎并没有简单地将这些相同的严格规则抛入无意识;相反,我们似乎已经获得了肌肉格式塔,它赋予了我们的行为新的灵活性和流畅性。获得感知技能也是如此。(1972/92,248-249;第二版措辞)
Generally, in acquiring a skill—in learning to drive, dance, or pronounce a foreign language, for example—at first we must slowly, awkwardly, and consciously follow the rules. But then there comes a moment when we finally can perform automatically. At this point we do not seem to be simply dropping these same rigid rules into unconsciousness; rather we seem to have picked up the muscular gestalt which gives our behavior a new flexibility and smoothness. The same holds for acquiring the skill of perception. (1972/92, 248–249; second edition wording)
“肌肉格式塔”?肌肉与它有什么关系?我们提出这样的问题,甚至可能带着一丝恼怒,因为一个非常诱人的传统故事。当我们做出明智的行为时,我们的理性智力会(有意识和/或无意识地)考虑各种事实,弄清楚要做什么,然后发出适当的输出指令。这些指令由输出传感器转换为物理配置(机械力、电流、化学浓度等),从而产生必要的身体行为。因此,传感器充当(或定义)理性与物理之间的接口。因此,它们也提供了一个自然的细分点——从某种意义上说,任何对相同指令做出相同行为反应的替代输出子系统都可以被替换,而不会对智力部分产生任何重大影响。那么,在那幅图上,肌肉将完全属于物理方面,与智能(子)系统完全无关——即使是“格式塔”。
A “muscular gestalt”? What have the muscles got to do with it? We react with questions like these, perhaps even with a trace of exasperation, because of a very seductive traditional story. When we are acting intelligently, our rational intellect is (consciously and/or unconsciously) taking account of various facts at its disposal, figuring out what to do, and then issuing appropriate output instructions. These instructions are converted by output transducers into physical configurations (mechanical forces, electric currents, chemical concentrations,…) that result in the requisite bodily behavior. The transducers, therefore, function as (or define) interfaces between the rational and the physical. As such, they also provide a natural point of subdivision—in the sense that any alternative output subsystem that responded to the same instructions with the same behavior could be substituted without making any significant difference to the intellectual part. On that picture, then, the muscles would fall entirely on the physical side, and not be relevant to the intelligent (sub)system at all—even as “gestalts”.
那么,我们的思想和身体之间有转换器吗?从某种过于简单的角度来看,这个问题似乎很难回答:当然有。几乎从定义上看,我们的思想中的符号或概念内容与我们身体中的物理过程之间必须存在转换;而这种转换就是传导。但德雷福斯实际上是在否认这一点——不是通过否认存在思想或身体,而是通过否认它们之间需要任何接口或转换。
Well, are there transducers between our minds and our bodies? From a certain all-too-easy perspective, the question can seem obtuse: of course there are. Almost by definition, it seems, there has to be a conversion between the symbolic or conceptual contents of our minds and the physical processes in our bodies; and that conversion just is transduction. But Dreyfus is, in effect, denying this—not by denying that there are minds or that there are bodies, but by denying that there needs to be any interface or conversion between them.
决定命运的骰子已经掷出,智力会弄清楚事情然后发出指令。指令(根据传统观点)是一种句法表达,由于属于一个可适当解释的形式系统,因此带有某种语义内容。具体而言,其内容不取决于它如何或是否会被任何特定的物理输出系统执行。例如,如果我决定输入字母“A”,那么即将发出的指令的内容不会取决于它是对我的手指的指令,还是对任何其他指令或甚至是某个机器人假肢的指令。任何可以接受该指令并输入“A”的输出系统(以及经过适当修改的其他指令和行为)都可以做到这一点。存在此类指令的想法在道德上等同于存在传感器的想法。
The fateful die is already cast in the image of the intellect figuring things out and then issuing instructions. An instruction (according to conventional wisdom) is a syntactic expression which, by virtue of belonging to a suitably interpretable formal system, carries a certain sort of semantic content. Specifically, its content does not depend on how or whether it might be acted upon by any particular physical output system. For instance, if I decide to type the letter ‘A’, the content of the forthcoming instruction wouldn’t depend on it being an instruction to my fingers, as opposed to any others, or even some robotic prosthesis. Any output system that could take that instruction and type an ‘A’—and, mutatis mutandis, other instructions and behaviors—would do as well. The idea that there are such instructions is morally equivalent to the idea that there are transducers.
另一种——且不相容的——故事可能是这样的。有数千万条(或其他)神经通路从我的大脑(或大脑皮层,或其他)通向手指、手、手腕、手臂、肩膀等部位的各种肌肉纤维,也从各种触觉和本体感受细胞通向回来。每次我输入一个字母时,这些神经通路中的很大一部分会以不同的频率、以不同的时间关系激发。但是,某种特定的脉冲模式在某种情况下会导致我输入“A”,这取决于许多偶然因素,而不仅仅是它碰巧是哪种脉冲模式。
A different—and incompatible—story might go like this. There are tens of millions (or whatever) of neural pathways leading out of my brain (or neocortex, or whatever) into various muscle fibers in my fingers, hands, wrists, arms, shoulders, and so on, and also from various tactile and proprioceptive cells back again. Each time I type a letter, a substantial fraction of these fire at various frequencies, and in various temporal relations to one another. But that some particular pulse pattern, on some occasion, should result in my typing an ‘A’ depends on many contingencies, over and above just which pattern of pulses it happens to be.
首先,这取决于我的手指长度、肌肉力量和敏捷度、关节形状等。当然,我用手做的其他事情,从打字到系鞋带,同样也同时取决于特定的脉冲模式和其他具体情况。但无需“分解”这些不同依赖关系的各自贡献,以便内容可以始终如一地分配给脉冲模式,而不管它们被分配给哪个手指。也就是说,无需任何方法——即使在原则上,即使有上帝自己的显微手术——将我的神经元重新连接到任何其他人的手指上,以便我可以可靠地用它们打字或系鞋带。这就像试图将一千台电视机的立方体组装成一台新的电视机一样。但是,在这种情况下,任何给定模式的“含义”取决于它是否是专门针对我的特定手指的模式 - 或者,用德雷福斯的话来说,针对具有我的“肌肉格式塔”的手指。
In the first place, it depends on the lengths of my fingers, the strengths and quicknesses of my muscles, the shapes of my joints, and the like. Of course, whatever else I might do with my hands, from typing the rest of the alphabet to tying my shoes, would likewise depend simultaneously on particular pulse patterns and these other concrete contingencies. But there need be no way to “factor out” the respective contributions of these different dependencies, such that contents could consistently be assigned to pulse patterns independent of which fingers they’re destined for. That is to say, there need be no way—even in principle, and with God’s own microsurgery—to reconnect my neurons to anyone else’s fingers, such that I could reliably type or tie my shoes with them. It would be like trying to assemble the cubes from a thousand TV sets into a single new one. But, in that case, what any given pattern “means” depends on it being a pattern specifically for my particular fingers—or, to use Dreyfus’s phrase, for fingers with my “muscular gestalts”.
也许打个比方会有所帮助——尽管它有些牵强。想象一种加密算法具有以下三个特点:它使用非常大的加密密钥(比如数千万比特);即使是相当简短的消息,密码的大小也与密钥本身相当;并且它具有极大的冗余性,因为(对于每个密钥)无数不同的密码可以解码为同一条消息。现在,对于给定的密钥和消息,考虑所有可以解码该消息的密码;并问除了那个特定的密钥之外,讨论这些密码有什么共同点是否有意义。很难看出这怎么可能。然而,如果单个密码有任何意义,那么这些密码都必须具有相同的意义;所以要么密码毫无意义,要么只有与特定密钥结合才有意义。那么这个比喻是这样的:每个人的特定身体——他或她自己的肌肉格式塔——就像一把大加密密钥;而从大脑传下来的脉冲模式则是密码,要么毫无意义,要么只有与特定身体结合才有意义。无论如何,它们都不是指令。然而,这只是一个比喻,因为手指的活动不应被视为“解码神经信息”,而应被视为大脑和其他神经元也参与的“处理”的一个组成部分。
Perhaps an analogy would help—even if it’s fairly far fetched. Imagine an encryption algorithm with the following three features: it uses very large encryption keys (tens of millions of bits, just for instance); cryptograms, even for quite brief messages, are comparable in size to the keys themselves; and it is tremendously redundant, in the sense that (for each key) countless distinct cryptograms would decode to the same message. Now, consider, for a given key and message, all the cryptograms that would decode to that message; and ask whether it could make any sense to speak of what these cryptograms have in common apart from that particular key. It’s hard to see how it could. Yet, if individual cryptograms have any meaning at all, then these must all have the same meaning; so either cryptograms are meaningless, or they mean something only in conjunction with a particular key. Then the analogy works like this: each individual’s particular body—his or her own muscular gestalts—functions like a large encryption key; and the pulse patterns coming down from the brain are the cryptograms, which are either meaningless, or they mean something only in conjunction with that particular body. Either way, they aren’t instructions. This is only an analogy, however, because the activity of the fingers should not be regarded as “decoding neural messages”, but rather as an integral part of the “processing” that the brain and other neurons also contribute to.
但即便如此,这种预测也可能过于乐观。给定的传出神经模式是否会导致打出“A”,还取决于我的手指如何部署和移动、键盘的角度、我的疲劳程度等等——这些因素并不是一成不变的,即使是在短期内。在不同情况下,相同的模式会打出不同的字母,而不同的模式会打出相同的字母。尽管如此,我还是可以打字,是因为有相当丰富的传入模式形成了一种反馈回路,可以不断“重新校准”系统。(就上述类比而言,这就像加密密钥不仅很大,而且不断变化——新的密钥一直被发送到上游。)但这意味着任何给定神经输出模式的“内容”不仅取决于它所连接的特定身体,还取决于其当前世界状况的具体细节。
But even that may be overly sanguine. Whether a given efferent neural pattern will result in a typed ‘A’ depends also on how my fingers happen already to be deployed and moving, the angle of the keyboard, how tired I am, and so on—factors that aren’t constant, even for the short run. On different occasions, the same pattern will give different letters, and different patterns the same letter. The reason that I can type, despite all this, is that there are comparably rich afferent patterns forming a kind of feedback loop that constantly “recalibrates” the system. (In terms of the above analogy, it’s as if the encryption keys were not only large, but ever changing—the new ones being sent upstream all the time.) But that would mean that the “content” of any given neural output pattern would depend not only on the particular body that it’s connected to, but also on the concrete details of its current worldly situation.
如果有简单的指令(定义明确、可重复的信息)通过神经从我的大脑传到我的手指,那么这个窄带宽通道就可以成为我手指的物理传感器接口。因此,可以在那里划分系统,用“等效”手指代替我的手指。许多哲学和大多数人工智能都隐含地假设了这种架构。相比之下,我一直在勾勒的另一种观点认为这些神经承载着高带宽交互(用西蒙的话来说就是高强度),没有任何简单、定义明确的结构。因此,按照同样的标准,我们不会得到两个相对独立的可分离组件——理性思维和身体,它们在一个界面上相遇——而是一个紧密结合的单一整体。
If there were simple instructions—well-defined, repeatable messages—coming down the nerves from my brain to my fingers, then that narrow-bandwidth channel could be an interface to my fingers as physical transducers. Accordingly, it would be possible to divide the system there, and substitute “equivalent” fingers, in place of mine. Such an architecture is implicitly assumed by much of philosophy and most of AI. By contrast, the alternative that I have been sketching sees these nerves as carrying high-bandwidth interactions (high-intensity, in Simon’s terms), without any simple, well-defined structure. Thus, by the same criterion, we would not get two relatively independent separable components—a rational mind and a physical body, meeting at an interface—but rather a single closely-knit unity.
当然,神经纤维并不是我的手指和身体其他部位之间唯一的高带宽通道。例如,免疫系统非常复杂且反应灵敏。从技术意义上讲,即使是代谢物和副产品的循环运输也携带着大量“信息”。为什么要关注神经?一旦提出问题,答案就很容易看出。当我们试图理解智力时——比如说,表现在智能行为中——我们会关注特定于该行为的复杂性。抗体和蛋白质在我手中的分布,虽然无疑对我的打字能力至关重要,但这并不因我输入的字母而有很大差异;事实上,它并不很大程度上取决于我是打字、手写还是打蝴蝶结。
Nerve fibers, of course, aren’t the only high-bandwidth channel between my fingers and other parts of my body. The immune system, for instance, is extraordinarily complex and responsive. And even the circulatory transportation of metabolites and by-products carries, in the technical sense, a lot of “information”. Why focus on the nerves? Once the question is asked, the answer is easy to see. When we are trying to understand intelligence—as manifested, say, in intelligent behavior—we look to the complexity that is specific to that behavior. The distribution of antibodies and proteins to my hands, while no doubt essential to my typing ability, doesn’t differ very much depending which letter I type; indeed, it doesn’t depend much on whether I’m typing, writing longhand, or tying bows.
但是神经系统的复杂性是针对特定任务的,并且表现在两个方面。首先,在任何时刻,输入“A”所需的脉冲模式都不同于输入“B”,更不用说与写“B”所需的脉冲模式不同了。这里所说的技能只是正确完成这些任务的能力。而正确完成这些任务(这个字母而不是那个字母)取决于实际脉冲模式的具体细节,而不是取决于我的免疫防御的任何细节。其次,当我获得这些技能时,我的神经通路发生了各种或多或少永久性的变化,以实施所有相关的习惯和反射。(记住,德雷福斯当时讨论的是学习。)同样,这些变化也特定于所学的技能,而循环容量的增加则不会如此。
But the complexity of the nervous system is task specific, and in two different ways. In the first place, at any moment, the pulse patterns needed for typing an ‘A’ differ from those needed for typing a ‘B’, not to mention from those needed for writing a ‘B’. The skills in question just are the abilities to get these things done right. And getting them right—this letter as opposed to that one—depends in specific detail on the actual pulse patterns, in a way that it does not depend on any details of my immune defenses. Second, as I acquired these skills, various more or less permanent changes were made in my neural pathways, in implementing all the relevant habits and reflexes. (Dreyfus, remember, was discussing learning.) And these changes, likewise, were specific to the skills learned—in a way that, for example, increases in circulatory capacity wouldn’t have been.
然而,重点并不是只关注神经系统。远非如此。如上所述,实际表现取决于神经冲动之外的许多其他特定偶然因素。同样,技能的习得还涉及一系列其他特别相关的永久性变化。因此,必须开发和维持具有必要力量、形状和柔韧性的肌肉——不同的技能需要不同的方式。这对于非常苛刻的手动能力最为明显,例如音乐才能、外科手术和舞台魔术。事实上,专业小提琴手必须获得特定的老茧,也许是指骨尖端的细微凹槽,甚至是特定的下颌。
The point, however, is not to focus exclusively on the nervous system. Far from it. As emphasized above, actual performance depends on a number of other specific contingencies besides nerve impulses. Similarly, a range of other specifically relevant permanent changes are involved in the acquisition of skills. Thus, muscles of the requisite strengths, shapes, and limberness must be developed and maintained—differently for different skills. This is most conspicuous for very demanding manual abilities, like musicianship, surgery, and stage magic. Indeed, a professional violinist must acquire specific callouses, perhaps subtle grooves in the tips of her phalanges, even a certain set of the jaw.
也许,可以基于一元论的普遍原则或科学的统一性,全面地促进身心的统一。这种论点对精神或身体的多样性和基础结构漠不关心:一切都被毫不客气地一举集中在一起。相比之下,这里提供的是零售整合。在试图破坏心灵和手指之间接口的想法时,我并没有声称肝脏或肠道是属于我的。(西蒙对腺体和内脏的看法可能是对的。)这个想法并不是要消除所有的区别,并根据普遍原则使一切同质化,而是要基于对它们非常特殊的考虑,对某些非常熟悉的划分提出质疑。
The unity of mind and body can be promoted wholesale, perhaps, on the basis of general principles of monism or the unity of science. Such arguments are indifferent to variety and substructure within either the mental or the physical: everything is unceremoniously lumped together at one swoop. Here, by contrast, integration is offered at retail. In attempting to undermine the idea of an interface between the mind and the fingers, I am staking no claim to the liver or intestines. (Simon may be right about glands and viscera.) The idea is not to wipe out all distinctions and homogenize everything on general principles, but rather to call certain very familiar divisions into question, on the basis of considerations highly peculiar to them.
如果地毯不合适,那么将它在一个地方铺平只会将凸起移到另一个地方。如果大脑和手指之间没有接口,那么也许它就位于(必须在?)其他地方。例如,人们可能会想象传出神经具有高带宽,因为它们(以及大部分脊髓和部分大脑)都是非常复杂的物理输出系统的一部分——心理物理接口本身位于更“内部”的位置。推理(或更普遍的表征)可能只发生在大脑皮层,或只发生在大脑新皮层,或其他任何地方。然后,相关的传导必须发生在大脑内部,发生在大脑的一部分和另一部分之间——离松果体不太远,幸运的话。
If a rug doesn’t fit, then flattening it out in one place will just move the hump to another. If there’s no interface between the brain and the fingers, then maybe it just is (has to be?) somewhere else. One might imagine, for instance, that the efferent nerves are high bandwidth because they (along with much of the spinal cord and some of the brain) are all part of a very sophisticated physical output system—the psycho-physical interface itself being further “in”. It could be that ratiocination (or representation more generally) occurs only in the cortex, or only the neocortex, or whatever. Then the relevant transductions would have to take place within the brain, between one part of it and another—not so far from the pineal gland, as luck might have it.
现在,我的问题是:为什么有人会被这样的假设所诱惑?答案当然是相同的预先假设,即精神在种类上必须与身体或世俗的任何事物不同——完全不同;因此,一定存在某种界面。因为,如果没有这种先验信念,神经解剖学的明显证据将是决定性的:从感知到行动的神经通路始终是高带宽的。如果有的话,带宽会向中心增加,而不是缩小。没有合理的限制,明确的指令可能会被转换;也就是说,没有地方不适用上述反对传出传导的论点。
Now, my question is: Why would anyone ever be tempted by such a supposition? And the answer, surely, is the same presupposition that the mental must be different in kind—categorically different—from anything bodily or worldly; so, there must be some interface somewhere. For, without that a priori conviction, the obvious evidence of neuroanatomy would be decisive: the neural pathways from perception to action are high-bandwidth all the way through. If anything, the bandwidth increases toward the center, rather than narrowing down. There’s just no plausible constriction where well-defined instructions might be getting converted; that is, there’s no place where a counterpart of the above argument against efferent transduction wouldn’t apply.
那么,凸起会不会从另一个方向滑出,越过手指呢?不可否认,键盘本身就是一个定义明确的界面。无论敲击键盘的方式多么复杂和多样,结果总是局限于几百个字符代码,以缓慢、明确的顺序排列。(电传打字机以前是我们低带宽设备的典范,这并非毫无道理。)因此,通过这些光源,有意义的(精神)一直延伸到指尖——也许再往前一点——然后与物理世界交互。可以肯定的是,对于精神/物理传导的捍卫者来说,这是一个令人惊讶的退路——当然不是笛卡尔精神。但它能起作用吗?
Well then, might the hump slip out in the other direction, out past the fingers? It cannot be denied that the keyboard itself is a well-defined interface. No matter how complicated and various are the ways of striking the keys, the result is always limited to character codes from a set of a few hundred, in a slow, unambiguous, serial order. (It’s not for nothing that a teletype was earlier our paradigm of a low-bandwidth device.) By these lights, then, the meaningful (mental) extends all the way to the fingertips—maybe a touch beyond—and then interfaces to the physical world. This is, to be sure, a surprising fall-back position for defenders of mental/physical transduction—certainly not Cartesian in spirit. But could it work?
第一个线索是例子的虚假性。在键盘上打字虽然确实是一项技术活,但其成功条件的数字化特征却非常不典型。德雷福斯说话而不是开车、跳舞或发外语。这些也很难学;但没有简单、明确的测试来判断学习者是否做对了。问题不仅仅在于,就像砍柴或配色一样,你可能或多或少做对了,错误是有程度的。相反,对于开车、跳舞甚至发音,没有明确的标准来说明正确和不正确之间的区别。这并不是否认做得好和做得不好之间存在差异,也不是否认专家可以分辨。恰恰相反:声称“分辨差异”本身就是一种技能——同样难以学习,而且无法确切说明做什么或怎么做。
The first clue that it cannot is the artificiality of the example. Typing at a keyboard, though genuinely skillful activity, is quite atypical in the digital character of its success conditions. Dreyfus speaks instead of driving, dancing, or pronouncing a foreign language. These, too, are hard to learn; but there is no simple, well-defined test for whether the learner has got it right or wrong. The point is not merely that, like cutting wood or matching colors, you can be more or less right, that errors come in degrees. Rather, for driving, dancing, and even pronunciation, there is no well-defined standard specifying the difference between correct and incorrect. This is not to deny that there is a difference between doing well and doing badly, or that experts can tell. Quite the contrary: the claim is that “telling the difference” is itself a skill—one that is likewise hard to learn, and for which there can be no exact specification of what is done, or how.
然而,即使是开车、跳舞和发音,也比我们所做的大多数事情更受社会限制和严格限制。从烹饪到做爱,从陪孩子玩耍到在商场购物,我们的生活充满了展现人类学习和人类智慧的活动,我们中的一些人比其他人更擅长这些活动(正如好人所言),但我们中没有人能明确规定一个明确的标准。这是世界上一般熟练的人的特点。相比之下,打字或配色准确度的简单、界面般的明确性是特殊情况。所以地毯上的凸起也不能向外滑动。我们必须让它全部平放。
Even driving, dancing, and pronunciation, however, are more socially circumscribed and narrowly delimited than most of what we do. From cooking to love making, from playing with the kids to shopping in the mall, our lives are filled with activities that exhibit human learning and human intelligence, that some of us are better at than others (as the good ones can tell), but for which none of us could articulate a definitive standard. This is the character of skillful being in the world in general. The simple, interface-like definiteness of what counts as accuracy in typing or color matching is, by contrast, the special case. So the hump in the rug can’t slide outward either. We have to make it all lie flat.
如果心灵与身体之间、心灵与世界之间没有确定的界面,这是否意味着身体和世界在某种程度上是精神的,或者心灵是物质的和世俗的?是的,在某种程度上,两者兼而有之。但这并不意味着所有区别都消失了,这三个术语也不再相同,不再重复。一如既往,这取决于我们试图理解什么。当我们研究解剖学和生理学时,大脑与生物体的其他部分相对可分离;生物体本身与其环境的分离就更加明显;而心灵则完全不存在。另一方面,当我们的主题是智力时,心灵就非常切中要害,它的范围和局限性也是问题的一部分。
If there is no determinate interface between the mind and the body, or between the mind and the world, does this mean that the body and the world are somehow mental, or that the mind is corporeal and mundane? Yes, in a way, both. But not in a way that washes out all distinctions, rendering the three terms synonymous, and therefore redundant. As always, it is a matter of what we are trying to understand. When we are studying anatomy and physiology, the brain is relatively separable from the rest of the organism; the organism itself is even more separable from its environment; and the mind isn’t in the picture at all. When, on the other hand, our topic is intelligence, then the mind is very much to the point, and its scope and limits are part of the issue.
智能存在于意义之中。这并不是说智能被意义所包围或指向意义,好像它们是两个独立的现象,以某种方式相互关联。相反,智能本身就存在于意义之中——就像一个国家的财富在于其生产能力,或者一个公司的实力可能在于其市场地位一样。当然,这里的意义不能完全是被动的和惰性的,还必须包括活动和过程。因此,智能不过是有意义的行为和对象的整体交互结构。这是科学家和哲学家们共同的观点,从最经典的人工智能到最激进的批评者,其中包括西蒙和德雷福斯。为什么?
Intelligence abides in the meaningful. This is not to say that it is surrounded by or directed toward the meaningful, as if they were two separate phenomena, somehow related to one another. Rather, intelligence has its very existence in the meaningful as such—in something like the way a nation’s wealth lies in its productive capacity, or a corporation’s strength may consist in its market position. Of course, the meaningful, here, cannot be wholly passive and inert, but must include also activity and process. Intelligence, then, is nothing other than the overall interactive structure of meaningful behavior and objects. This is a view shared by scientists and philosophers, all the way from the most classical AI to its most radical critics—including, among others, Simon and Dreyfus. Why?
也许可以这样提出基本思想。智力是可靠地处理比现在和显现更多事物的能力。这当然不是智力的充分定义,但它确实触及了一些本质的东西,特别是与意义有关的东西。例如,表征——尤其是心理表征——通常被视为有意义的原型,而智力就存在于其中。这种联系是显而易见的。表征显然是应对缺席和隐蔽的资产,因为它们本身是存在的,并且“替代”了它们“代表”的其他东西——缺失的东西。这种“替代”就是它们的意义所在;这也是智力成为可能的原因。
Perhaps the basic idea can be brought out this way. Intelligence is the ability to deal reliably with more than the present and the manifest. That’s surely not an adequate definition of intelligence, but it does get at something essential, and, in particular, something that has to do with meaning. For instance, representations—especially mental representations—are often taken as the archetype of the meaningful, and that wherein intelligence abides. The connection is straightforward. Representations are clearly an asset in coping with the absent and covert, insofar as they themselves are present, and “stand in for” something else—something absent—which they “represent”. This “standing in for” is their meaningfulness; and it is what makes intelligence possible.
它是如何运作的?一个典型的故事是这样的。个体表征只有与许多其他表征一起参与更大且受规范控制的表征方案才能发挥作用。然后,假设方案本身状况良好且使用正确,系统可以通过跟踪和探索其当前和明显的表征替代物来间接跟踪和探索缺失和隐蔽的表征现象。(实际上,方案“状况良好”意味着这种应对通常可行。)实际上,现存表征的结构与方案本身的结构相结合,“编码”了所表征内容的某些结构,以便后者即使在视野之外也能被容纳或考虑。
How does it work? A typical sort of story goes like this. Individual representations can function as such only by participating, in concert with many others, in a larger and norm-governed scheme of representation. Then, assuming the scheme itself is in good shape, and is used correctly, a system can vicariously keep track of and explore absent and covert represented phenomena by keeping track of and exploring their present and manifest representational stand-ins. (Really, what it means for a scheme to be “in good shape” is for this coping at one remove to be generally feasible.) In effect, the structure of the extant representations, in conjunction with that of the scheme itself, “encode” something of the structure of what is represented, in such a way that the latter can be accommodated or taken account of, even when out of view.
对智力的另一种理解可能会保留此论述的基本框架,同时修改某些细节。例如,我们对传导的讨论可能会对心理表征甚至心灵/世界分离的想法产生一些怀疑。但它不必破坏更广泛的观点,即智力存在于有意义的事物中,或者它存在于处理不明显的事物的能力中。有没有办法保留后者,而没有前者?也就是说,有没有办法从意义的角度来理解智力的有效性,但没有表征或分离的内在领域?
An alternative understanding of intelligence might keep the basic framework of this account, while modifying certain specifics. Our discussion of transduction, for instance, may have cast some suspicion on the idea of mental representation, or even of mind/world separation. But it need not undermine the broader view that intelligence abides in the meaningful, or that it consists in an ability to deal with the unobvious. Could there be a way to retain these latter, but without the former? That is, could there be a way to understand the effectiveness of intelligence in terms of meaningfulness, but without representations or a separated inner realm?
在关于技能和肌肉格式塔的段落之后不久,德雷福斯就准确地回答了这个问题。
Not long after the passage about skills and muscular gestalts, Dreyfus addresses exactly this question.
当我们身处这个世界时,我们所生活的环境中所嵌入的有意义的物体并不是存储在我们头脑或大脑中的世界的模型:它们就是世界本身。(265–266)
When we are at home in the world, the meaningful objects embedded in their context of references among which we live are not a model of the world stored in our mind or brain: they are the world itself. (265–266)
这句内容丰富、力量强大的句子实际上提出了几个(密切相关的)观点。首先,可以说,有意义之物有一个位置;其次,有意义之物的特征;第三,有意义之物与我们的关系。有意义之物不在我们的思想或大脑中,而是本质上是世俗的。有意义之物不是一种模型——也就是说,它不是表征性的——而是嵌入在参考语境中的对象。我们不会将意义存储在我们自己内部,而是生活在其中,并安居其中。这些都概括为“有意义之物就是世界本身”这句口号。(这可能不合时宜地让人想起布鲁克斯后来的不那么激进的名言,即世界是它自己最好的模型。)
There are really several (closely related) points being made in this dense and powerful sentence. First, there is, so to speak, the locus of the meaningful; second its character; and third, our situation with regard to it. The meaningful is not in our mind or brain, but is instead essentially worldly. The meaningful is not a model—that is, it’s not representational—but is instead objects embedded in their context of references. And we do not store the meaningful inside of ourselves, but rather live and are at home in it. These are all summed up in the slogan that the meaningful is the world itself. (This may be reminiscent, anachronistically, of Brooks’s later but less radical dictum that the world is its own best model.)
第一个论点,从消极的方面来说,只是对认知科学和传统哲学中几乎无处不在的观点的否定,即在认知科学和传统哲学中,有意义的对象是智力所遵循的主要是内在的。“传统”认知科学家将这些内在对象限制为符号表达和模型,而其他人则对心理图像、认知地图甚至“分布式表示”持更开放的态度。但德雷福斯希望将意义扩展到任何传统意义上的内在之外:有意义的对象是“世界本身”。
The first thesis, in its negative aspect, is simply a repudiation of the view, almost ubiquitous in cognitive science and traditional philosophy, that the meaningful objects amidst which intelligence abides are primarily inner. “Classical” cognitive scientists restrict these inner objects to symbolic expressions and models, whereas others are more liberal about mental images, cognitive maps, and maybe even “distributed representations”. But Dreyfus wants to extend the meaningful well beyond the inner, in any traditional sense: meaningful objects are “the world itself “.
重要的是要防止可能出现的误解。每个人都会承认,世俗事物可以以派生的方式有意义,就像我们赋予它们我们已经拥有的意义一样。例如,你和我可能同意使用某种信号来表示,比如说,英国人来了;那么它确实意味着这一点——但只是从我们的决定中衍生出来的。(许多哲学家和科学家会进一步认为,这是外部事物变得有意义的唯一方式。)相比之下,当德雷福斯说有意义的事物是世界本身时,他指的是原始意义,而不仅仅是派生意义。也就是说,智慧本身存在于世界的“外部”,而不仅仅是“内部”——这与古典或其他认知科学相反。
It is important to guard against a possible misunderstanding. Everyone would allow that worldly objects can be meaningful in a derivative way, as when we assign to them meanings that we somehow already have. You and I, for instance, could agree to use a certain signal to mean, say, that the British are coming; and then it would indeed mean that—but only derivatively from our decision. (Many philosophers and scientists would hold further that this is the only way that external objects can come to be meaningful.) By contrast, when Dreyfus says that meaningful objects are the world itself, he means original meaning, not just derivative. That is, intelligence itself abides “out” in the world, not just “inside”—contra cognitive science, classical or otherwise.
第二个论点,从消极的方面来说,再次否定了一个几乎普遍存在的假设:有意义的事物主要是表征性的。和以前一样,它的目标不仅是古典符号方法,而且是其大多数更自由的继承者。这两个消极观点结合起来构成了对有时被称为“心灵表征理论”的拒绝。从积极的方面来说,有意义的事物是“嵌入其参考语境中的对象”,这个论点可能需要一些解释。显然,德雷福斯想到的是工具和其他用具。这些有意义的意义是什么?
The second thesis, in its negative aspect, is again a repudiation of an almost universal assumption: that the meaningful is primarily representational. As before, the target is not only the classical symbolic approach, but most of its more liberal successors. These two negative points combined constitute a rejection of what is sometimes called “the representational theory of the mind”. In its positive aspect, that the meaningful is “objects embedded in their context of references”, the thesis may call for some explanation. Clearly what Dreyfus has in mind are tools and other paraphernalia. What is the sense in which these are meaningful?
我们可以粗略地说,一般而言,有意义的事物就其自身之外的某种事物而言是重要的,并根据该重要性进行规范性评价。然后,我们可以将表征视为这种意义上有意义的熟悉范式。表征的意义在于它所要表征的东西——它的对象——而对它的评价则取决于它是否正确或准确地表征了该对象。当认知科学家和哲学家谈到有意义的内在实体时,他们总是指表征(除了表征之外,没有任何东西被认为是内在的和有意义的)。实际上,笛卡尔发明了“内在领域”作为认知表征的储存库——最重要的是,它之外事物的表征——而认知科学并没有真正改变这一点。
We might begin by saying, very roughly, that the meaningful in general is that which is significant in terms of something beyond itself, and subject to normative evaluation according to that significance. Then we could see representations as familiar paradigms of the meaningful in this sense. That in terms of which a representation is significant is that which it purports to represent—its object—and it is evaluated according to whether it represents that object correctly or accurately. When cognitive scientists and philosophers speak of meaningful inner entities, they always mean representations (nothing other than representations has ever been proposed as inner and meaningful). Descartes, in effect, invented the “inner realm” as a repository for cognitive representations—above all, representations of what’s outside of it—and cognitive science hasn’t really changed that at all.
但是,当德雷福斯认为有意义的物体就是世界本身时,他指的不仅仅是(甚至大部分也不是)表象。世界不可能“自始至终”都是表象。但这并不是说它不可能全部都是有意义的,因为意义的种类比表象内容要多。二十世纪早期的许多哲学家——杜威、海德格尔、维特根斯坦和梅洛-庞蒂,仅举几位最著名的哲学家——都强调了设备、公共场所、社区实践等的意义。例如,一把锤子的意义超越了它本身,因为它意味着它的用途:通过某种方式将钉子钉入木头,以便建造某物,等等。钉子、木头、项目和木匠本人同样以各自的方式卷入了这张“意义之网”。这些是世界本身的有意义的物体;它们都不是表象。
But when Dreyfus holds that meaningful objects are the world itself, he doesn’t just (or even mostly) mean representations. The world can’t be representation “all the way down”. But that’s not to say that it can’t all be meaningful, because there are more kinds of significance than representational content. A number of philosophers earlier in the twentieth century—Dewey, Heidegger, Wittgenstein, and Merleau-Ponty, to name a few of the most prominent—have made much of the significance of equipment, public places, community practices, and the like. A hammer, for instance, is significant beyond itself in terms of what it’s for: driving nails into wood, by being wielded in a certain way, in order to build something, and so on. The nails, the wood, the project, and the carpenter him or herself, are likewise caught up in this “web of significance”, in their respective ways. These are the meaningful objects that are the world itself; and none of them is a representation.
这里显然存在一个担忧,即整个要点取决于双关语。当然,锤子之类的东西是“重要的”(甚至是“有意义的”),因为它们对我们很重要,并且在正确使用时与其他东西相互依存。但这与承载内容或具有语义的意义不同。当然!这就是为什么它们不是表征。因此,大家一致认为:它们在更广泛的意义上是有意义的,但在更狭义的意义上则不然。真正的问题是:在理解人类智能的背景下,哪种意义重要?
There’s an obvious worry here that the whole point depends on a pun. Of course, hammers and the like are “significant” (and even “meaningful”) in the sense that they’re important to us, and interdependent with other things in their proper use. But that’s not the same as meaning in the sense of bearing content or having a semantics. Certainly! That’s why they’re not representations. So it’s agreed: they are meaningful in a broader sense, though not in a narrower one. The real question is: Which sense matters in the context of understanding human intelligence?
第三个论点是,我们生活在有意义的事物中,也就是生活在这个世界中,并且在那里安居。当然,部分原因在于我们生活在我们的工具之中,并且习惯于此。但更根本的洞见必须将有意义的事物与智力的本质联系起来。工具如何扩展我们应对当前和显性事物的能力,这一点已经足够清楚;这或多或少就是工具的定义。但它们如何帮助我们处理缺席和隐蔽的事物?或者更确切地说:那些确实帮助我们处理缺席和隐蔽事物的工具,难道不是正是由于这个原因,才成为表象吗?根本不是。
The third thesis is that we live in the meaningful—that is, in the world—and are at home there. Part of the point, to be sure, is that we reside in the midst of our paraphernalia, and are accustomed to it. But the more fundamental insight must connect the meaningful as such with the nature of intelligence. It is clear enough how tools can extend our capacity to cope with the present and manifest; that is more or less the definition of a tool. But how do they help us deal with the absent and covert? Or, rather: aren’t those tools that do help us with the absent and covert precisely, and for that very reason, representations? Not at all.
考虑一下到达圣何塞的能力。这是一种处理视野之外的事物(一个遥远的城市)的能力,因此这正是智能的特征。此外,认知科学家会本能地将其归因于某种表示,无论是内部还是外部的地图或指令集,智能系统要么包含其中,要么可以查阅和遵循。但这并不是实现效果的唯一方法。一种完全不同的方法是养一匹马,每匹马都经过预先训练,可以到达每个可能的目的地。然后,有能力的人所需要做的就是选择合适的马,骑上它,然后在终点下车。在这里,我们倾向于说,是马知道路,而不是骑手——或者也许全部能力都是合作的,就像吉尔伯特和沙利文那样。无论如何,马的贡献不容忽视。
Consider the ability to get to San Jose. That’s a capacity to deal with something out of view—a distant city—and so just what is characteristic of intelligence. Moreover, a cognitive scientist will instinctively attribute it to some sort of representation, either an internal or external map or set of instructions, which an intelligent system either consists in or can consult and follow. But that’s not the only way to achieve the effect. A quite different approach would be to keep a stable of horses, one pre-trained for each likely destination. Then all that the capable person would need to do is pick the right horse, stay on it, and get off at the end. Here we’re inclined to say that it’s the horse that knows the way, not the rider—or maybe that the full ability is really collaborative, say like Gilbert and Sullivan’s. At any rate, the horse’s contribution is not to be ignored.
现在让我告诉你我是如何到达圣何塞的:我选择正确的路(880 号州际公路南段),沿着这条路走,然后在尽头下车。1我们能否说这条路知道去圣何塞的路,或者说我和这条路是协作的?我认为这并没有听起来那么疯狂。道路的复杂性(其形状)与任务的复杂性相当,并且是高度特定于任务的;此外,留在路上需要与这种复杂性进行持续的高带宽交互。换句话说,内部导航系统和道路本身必须紧密耦合,部分原因是该能力所依赖的大部分“信息”都“编码”在道路中。因此,就像学习并存储在记忆中的内部地图或程序一样(西蒙)必须被视为使用它到达圣何塞的智能系统的一部分,所以我认为这条路应该被视为我的能力不可或缺的一部分。
Now let me tell you how I get to San Jose: I pick the right road (Interstate 880 south), stay on it, and get off at the end.1 Can we say that the road knows the way to San Jose, or perhaps that the road and I collaborate? I don’t think this is as crazy as it may first sound. The complexity of the road (its shape) is comparable to that of the task and highly specific thereto; moreover, staying on the road requires constant high-bandwidth interaction with this very complexity. In other words, the internal guidance systems and the road itself must be closely coupled, in part because much of the “information” upon which the ability depends is “encoded” in the road. Thus, much as an internal map or program, learned and stored in memory, would (pace Simon) have to be deemed part of an intelligent system that used it to get to San Jose, so I suggest the road should be considered integral to my ability.
不要被这条路是由智能工程师设计和建造的事实所分心,他们无疑知道路线。即使我们可能想在这种情况下扩大合作,工程师并不像道路本身那样必不可少;因为有些“道路”——例如森林小径——不需要智能设计,但这个论点对他们来说是一样的。更严重的担忧是这个例子有多么狭隘:智能导航应该更加灵活,比如说,允许有其他目的地和起点。然后,智能似乎就在于这种适应性——知道如何从东边和北边到达那里,或者在哪里转弯去帕洛阿尔托或莫德斯托——这毕竟是内在的。但首先,即使是这种灵活性也大多编码在世界中,编码在路标中,使人们能够选择并留在“正确”的道路上。与此同时,道路本身仍然保留着从一个路口到达下一个路口的信息。然而,最重要的是要记住这一点:并不是说智力的所有结构都是“外在的”,而是只有其中的一部分是外在的,并且在某种程度上与其余部分不可分割。
Don’t be distracted by the fact that the road was designed and built by intelligent engineers who, no doubt, knew the route. Even if we might want to extend the collaboration in this case, the engineers are not essential in the way that road itself is; for some “roads”—forest trails, for instance—need not be intelligently designed, yet the argument works the same for them. A more serious worry is how narrow the example is: intelligent navigation ought to be more flexible, allowing, say, for alternative destinations and starting points. And then it might seem that the intelligence lies in this adaptability—knowing how to get there from the east as well as from the north, or where to turn off to get to Palo Alto or Modesto instead—which is internal after all. But, in the first place, even that flexibility is mostly encoded in the world, in the road signs that enable one to choose and stay on the “right” road. And, in the meantime, the road itself still holds the information for getting from one junction to the next. Most important, however, is to remember the point: it’s not that all of the structure of intelligence is “external”, but only that some of it is, in a way that is integral to the rest.
但是,道路示例是相当有限的。一种文化从生命及其环境学到的知识有多少被“编码”在其用具和实践中?以农业为例——毫无疑问,农业是人类智慧的基本体现,依赖于几个世纪以来积累的大量信息。那么,这些信息在哪里积累的呢?我想说,这一遗产的关键要素体现在犁、轭和挽具的形状和强度中,以及制造和使用它们的实践中。农民学到的技能也很重要;但这些技能除了他们所涉及的特定工具外都是无稽之谈,反之亦然。它们的相互作用必须是高带宽的、实时的。因此,它们构成了一个基本的整体——一个整体,它包含了关于土地可耕性、幼苗需求、保水、杂草控制、根系发育等方面的大量专业知识。
Still, the road example is quite limited. How much of what a culture has learned about life and its environment is “encoded” in its paraphernalia and practices? Consider, for example, agriculture—without question, a basic manifestation of human intelligence, and dependant on a vast wealth of information accumulated through the centuries. Well, where has this information accumulated? Crucial elements of that heritage, I want to claim, are embodied in the shapes and strengths of the plow, the yoke, and the harness, as well as the practices for building and using them. The farmer’s learned skills are essential too; but these are nonsense apart from the specific tools they involve, and vice versa. Their interaction must be high-bandwidth, in real time. Hence, they constitute an essential unity—a unity that incorporates overall a considerable expertise about the workability of the earth, the needs of young plants, water retention, weed control, root development, and so on.
一个复杂而成熟的机构的结构同样可以成为理解如何制造汽车或管理城市的重要贡献。各部门以这种方式相互关联,相关设施在那里,这些请求必须以这样或那样的形式提交给这些办公室——所有这些,即使不是一种理论,至少也构成了一个非常可观的整体能力架构的重要组成部分。这种能力与任何其他复杂的艺术或技术一样,都是人类独有的。然而,它不仅不是任何单个人的能力,也不是所有个人能力的总和——因为这个总和不包括既定的结构。相互关系和制度程序,更不用说物理工厂,这些都是整体的先决条件。问题不仅在于组织以功能有效的方式发展,就像昆虫和树木一样,而是机构的结构是在个人之间以及个人与其附属物之间的高带宽智能交互中实现的。此外,这些个人的专业知识不可能脱离他们对该结构的参与而存在。因此,每个人的智慧本身只有在他们更高的统一性方面才是可以理解的。
The structure of an intricate and established institution can likewise be an integral contribution to an understanding of how, say, to build cars or manage a city. That the departments are related this way, that the facilities for that are over there, that these requests must be submitted to those offices on such and such forms—all of this constitutes, if not a theory, at least an essential part of the architecture of a very considerable overall competence. Such competence is as distinctively human as is any other sophisticated art or technology. Yet, not only is it not the competence of any single individual, it is also not the sum of the competencies of all the individuals—for that sum would not include the structure of established interrelationships and institutional procedures, not to mention the physical plant, which are prerequisite to the whole. The point is not merely that organizations evolve in functionally effective ways, as do insects and trees, but rather that the structure of an institution is implemented in the high-bandwidth intelligent interactions among individuals, as well as between individuals and their paraphernalia. Furthermore, the expertise of those individuals could not be what it is apart from their participation in that structure. Consequently, the intelligence of each is itself intelligible only in terms of their higher unity.
即使在科学实验室这样一个自我意识很强的领域,无论是研究还是开发导向,研究、区分和操纵自然现象的大部分智能能力都体现在专业仪器、使用和维护它所需的手动和感知技能,以及清洁、深思熟虑和记录保存的一般实验室精神中。没有这些,科学就不可能存在;它们是科学不可或缺的一部分。问题不在于理论没有证据就毫无根据,或者没有应用就毫无用处。相反,除了在高度特定的复杂环境中与高度特定的复杂活动密切相关之外,科学智能根本不存在——它没有任何意义。尽管科学具有明确性和抽象性,但它与农业、制造业和政府一样世俗。
Even in so self-conscious a domain as a scientific laboratory, whether research or development oriented, much of the intelligent ability to investigate, distinguish, and manipulate natural phenomena is embodied in the specialized instrumentation, the manual and perceptual skills required to use and maintain it, and the general laboratory ethos of cleanliness, deliberation, and record keeping. Without these, science would be impossible; they are integral to it. The point is not that theory is baseless without evidence, or useless without applications. Rather, apart from its intimate involvement in highly specific complex activities in highly specific complex circumstances, there’s no such thing as scientific intelligence—it doesn’t make any sense. For all its explicitness and abstraction, science is as worldly as agriculture, manufacturing, and government.
我把最明显的人类智能外化(文本、图像、地图、图表、程序等)推迟到最后讨论,并不是因为我低估了它们的重要性,而是因为它们与传统上认为存在于头脑中的东西非常相似。这会带来两个危险。首先,它分散了人们对智能存在于有意义的世界中这一主张的激进性的注意力:不仅仅是书籍和记录,还有道路和犁、办公室、实验室和社区。其次,它让传统主义者很容易认为:“外部表征并不是智能的真正组成部分,而只是用于传达或恢复智能本身(即内心)内容的手段,否则它可能缺乏这些内容。”然而,到现在为止,这些危险(我希望)已经减弱了。因此,可以放心地承认(借用西蒙的话来说),文明积累的“大量信息”与其其余的家具一起属于其理解的居所。
I have postponed till last the most obvious externalization of human intelligence—texts, images, maps, diagrams, programs, and the like—not because I underestimate their importance, but because they are so similar to what is traditionally supposed to be in the mind. That poses two dangers. First, it distracts attention from the radicalness of the claim that intelligence abides in the meaningful world: not just books and records, but roads and plows, offices, laboratories, and communities. Second, it makes it too easy for a traditionalist to think: “External representations are not really integral to intelligence, but are merely devices for conveying or restoring to intelligence proper—the inner mind—contents which it might otherwise lack.” By now, however, these dangers will (I hope) have abated. So it can safely be acknowledged that (to borrow Simon’s phrase) the “great furniture of information” that civilization has accumulated belongs with the rest of its furniture in the abode of its understanding.
如果我们要理解心灵是智力的源泉,我们就不能像笛卡尔那样认为心灵在原则上可以与身体和世界分离。我曾指出,这种可分离性必须与窄带接口相一致,窄带接口是与智力相关的交互之一。近几十年来,人们致力于将智力理解为理性的问题解决——有时是先验的——通过将这些接口与传感器联系起来,支持了这些接口的存在。更广泛的方法摆脱了这种偏见,可以重新审视感知和行动,审视熟练的参与公共设施和社会组织,而不是原则上的分离,而是各种紧密结合和功能统一。作为我们应对缺席和隐蔽的能力,人类智慧存在于有意义的事物中——它远非局限于表象,而是延伸到整个人类世界。因此,心灵不是偶然地而是紧密地体现和嵌入其世界中。
If we are to understand mind as the locus of intelligence, we cannot follow Descartes in regarding it as separable in principle from the body and the world. I have argued that such separability would have to coincide with narrow-bandwidth interfaces, among the interactions that are relevant to intelligence. In recent decades, a commitment to understanding intelligence as rational problem solving—sometimes assumed a priori—has supported the existence of these interfaces by identifying them with transducers. Broader approaches, freed of that prejudicial commitment, can look again at perception and action, at skillful involvement with public equipment and social organization, and see not principled separation but all sorts of close coupling and functional unity. As our ability to cope with the absent and covert, human intelligence abides in the meaningful—which, far from being restricted to representations, extends to the entire human world. Mind, therefore, is not incidentally but intimately embodied and intimately embedded in its world.
罗德尼·A·布鲁克斯
Rodney A. Brooks
1991
1991
人工智能最初是一个旨在让机器复制人类智能的领域。随着人们认识到这一目标的规模和难度,早期的希望逐渐破灭。在接下来的 25 年里,人工智能在展示智能的各个方面取得了缓慢的进展。最近的一些研究往往集中在为人类工作者提供“智能助手”的商业化方面。
Artificial intelligence started as a field whose goal was to replicate human-level intelligence in a machine. Early hopes diminished as the magnitude and difficulty of that goal was appreciated. Slow progress was made over the next 25 years in demonstrating isolated aspects of intelligence. Some recent work has tended to concentrate on commercializable aspects of “intelligent assistants” for human workers.
没有人再谈论复制人类智能的全部范围。相反,我们看到人们退回到专门的子问题,例如知识表示、自然语言理解、视觉,甚至更专业的领域,例如真相维护或计划验证。这些子领域的所有工作都以人类在这些领域所做的任务为基准。在仍然在人工智能领域做梦的人(即那些不梦想赚钱的人)中,有一种感觉,有一天所有这些部分都会落实到位,我们将看到“真正”的智能系统出现。
No one talks about replicating the full gamut of human intelligence anymore. Instead we see a retreat into specialized subproblems, such as knowledge representation, natural language understanding, vision, or even more specialized areas such as truth maintenance or plan verification. All the work in these subareas is benchmarked against the sorts of tasks humans do within those areas. Amongst the dreamers still in the field of AI (those not dreaming about dollars, that is) there is a feeling that one day all these pieces will fall into place and we will see “truly” intelligent systems emerge.
然而,我和其他人都认为,人类级别的智能过于复杂,目前对其了解甚少,无法将其正确分解为正确的子部分,而且即使我们知道子部分,我们仍然不知道它们之间的正确接口。此外,除非我们对更简单的智能进行大量实践,否则我们永远不会了解如何分解人类级别的智能。
However, I and others believe that human-level intelligence is too complex and too little understood to be correctly decomposed into the right subpieces at the moment, and that even if we knew the subpieces we still wouldn’t know the right interfaces between them. Furthermore we will never understand how to decompose human-level intelligence until we’ve had a lot of practice with simpler intelligences.
因此,在本文中,我主张采用不同的方法来创造人工智能。
In this paper I therefore argue for a different approach to creating artificial intelligence.
我们一直遵循这种方法,并建造了一系列自主移动机器人。我们得出了一个意想不到的结论(C),并提出了一个相当激进的假设(H)。
We have been following this approach and have built a series of autonomous mobile robots. We have reached an unexpected conclusion (C) and have a rather radical hypothesis (H).
(C)当我们研究非常简单的智力水平时,我们发现,对世界的明确表述和模型只会妨碍智力的发展。结果发现,让世界本身作为其自身的模型才是更好的选择。
(C) When we examine very simple level intelligence we find that explicit representations and models of the world simply get in the way. It turns out to be better to let the world itself serve as its own model.
(H)在构建智能系统最庞大的部分时,表示是错误的抽象单位。
(H) Representation is the wrong unit of abstraction in building the bulkiest parts of intelligent systems.
在过去的 15 年里,表征一直是人工智能研究的核心问题,因为它为原本孤立的模块和会议论文之间提供了一个接口。
Representation has been the central issue in artificial intelligence work over the last 15 years only because it has provided an interface between otherwise isolated modules and conference papers.
我们已经有了存在智慧实体的可能性的证据:人类。此外,许多动物在某种程度上也具有智慧。(这是一个激烈争论的话题,其中大部分争论实际上都围绕着智慧的定义。)它们在地球 46 亿年的历史中不断进化。
We already have an existence proof of the possibility of intelligent entities: human beings. Additionally, many animals are intelligent to some degree. (This is a subject of intense debate, much of which really centers around a definition of intelligence.) They have evolved over the 4.6 billion year history of the earth.
反思地球上生物进化的过程是很有启发的。大约 35 亿年前,单细胞生物从原始汤中诞生。10 亿年后,光合植物出现。又过了将近 15 亿年,大约 5.5 亿年前,第一批鱼类和脊椎动物出现了,然后是 4.5 亿年前的昆虫。然后,一切开始快速发展。爬行动物在 3.7 亿年前出现,接着是 3.3 亿年前的恐龙和 2.5 亿年前的哺乳动物。第一批灵长类动物出现在 1.2 亿年前,类人猿的直接祖先仅在 1800 万年前出现。人类大约在 250 万年前以现在的形式出现。人类在短短 19,000 年前发明了农业,在不到 5000 年前才有了文字,而“专家”知识只是在过去几百年里才出现的。
It is instructive to reflect on the way in which earth-based biological evolution spent its time. Single-cell entities arose out of the primordial soup roughly 3.5 billion years ago. A billion years passed before photosynthetic plants appeared. After almost another billion and a half years, around 550 million years ago, the first fish and vertebrates arrived, and then insects 450 million years ago. Then things started moving fast. Reptiles arrived 370 million years ago, followed by dinosaurs at 330 and mammals at 250 million years ago. The first primates appeared 120 million years ago and the immediate predecessors to the great apes a mere 18 million years ago. Man arrived in roughly his present form 2.5 million years ago. He invented agriculture a scant 19,000 years ago, writing less than 5000 years ago and “expert” knowledge only over the last few hundred years.
这表明,一旦行动和反应的本质可用,解决问题的行为、语言、专业知识和应用以及推理都非常简单。这种本质是在动态环境中移动的能力,感知周围环境的能力足以实现生命和繁殖的必要维持。智力的这一部分是进化集中精力的地方——它要困难得多。
This suggests that problem-solving behavior, language, expert knowledge and application, and reason are all pretty simple once the essence of acting and reacting are available. That essence is the ability to move around in a dynamic environment, sensing the surroundings to a degree sufficient to achieve the necessary maintenance of life and reproduction. This part of intelligence is where evolution has concentrated its time—it is much harder.
我认为,机动性、敏锐的视觉和在动态环境中执行生存相关任务的能力是发展真正智力的必要基础。Moravec (1984) 相当雄辩地论证了这一点。
I believe that mobility, acute vision and the ability to carry out survival related tasks in a dynamic environment provide a necessary basis for the development of true intelligence. Moravec (1984) argues this same case rather eloquently.
人类水平的智能已经为我们提供了存在的证据,但我们必须谨慎看待从中能得到什么教训。
Human level intelligence has provided us with an existence proof, but we must be careful about what lessons are to be gained from it.
一个故事
A story
假设这是 19 世纪 90 年代。人工飞行是科学、工程和风险投资界的热门话题。一群 AF 研究人员奇迹般地乘坐时光机回到了 20 世纪 90 年代几个小时。他们全程待在一架中程飞行的商用波音 747 客机的客舱中。
Suppose it is the 1890’s. Artificial flight is the glamor subject in science, engineering, and venture capital circles. A bunch of AF researchers are miraculously transported by a time machine to the 1990’s for a few hours. They spend the whole time in the passenger cabin of a commercial passenger Boeing 747 on a medium duration flight.
回到 19 世纪 90 年代,他们感到精神振奋,因为他们知道 AF 可以大规模生产。他们立即着手复制他们所看到的东西。他们在设计倾斜座椅、双层玻璃窗方面取得了巨大进步,并且知道,只要他们能弄明白那些奇怪的“塑料”,他们就能实现梦想。(他们中的一些联结主义者瞥见了一台发动机的盖子被打开了,他们全神贯注于从那次经历中获得灵感。)
Returned to the 1890’s they feel invigorated, knowing that AF is possible on a grand scale. They immediately set to work duplicating what they have seen. They make great progress in designing pitched seats, double pane windows, and know that if only they can figure out those weird ‘plastics’ they will have the grail within their grasp. (A few connectionists amongst them caught a glimpse of an engine with its cover off and they are preoccupied with inspirations from that experience.)
人工智能研究人员喜欢指出,人工智能经常被否定其应有的成功。流行的说法是,当没有人知道如何解决某一类问题(例如下棋)时,它就被称为人工智能问题。然而,当人工智能研究人员开发的算法成功解决了这样的问题时,人工智能的批评者声称,由于该问题可以通过算法解决,所以它根本不是人工智能问题。因此,人工智能从未取得任何成功。
Artificial intelligence researchers are fond of pointing out that AI is often denied its rightful successes. The popular story goes that when nobody has any good idea of how to solve a particular sort of problem (for example, playing chess) it is known as an AI problem. When an algorithm developed by AI researchers successfully tackles such a problem, however, AI detractors claim that since the problem was solvable by an algorithm, it wasn’t really an AI problem after all. Thus AI never has any successes.
但你听说过人工智能失败吗?
But have you ever heard of an AI failure?
我认为人工智能研究人员也犯了同样的(自我)欺骗行为。他们将研究的问题分为两个部分。人工智能部分,他们要解决的部分;非人工智能部分,他们不解决的部分。通常,人工智能通过将未解决的问题部分定义为非人工智能而“成功”。这种划分的主要机制是抽象。它的应用通常被认为是良好科学的一部分,而不是(因为它实际上在人工智能中被使用)一种自我欺骗的机制。在人工智能中,抽象通常用于分解感知和运动技能的所有方面。我将在下文中论证这些是智能系统要解决的难题,而且这些问题的解决方案的形式极大地限制了剩下的小部分智能的正确解决方案。
I claim that AI researchers are guilty of the same (self-)deception. They partition the problems they work on into two components. The AI component, which they solve, and the non-AI component which they don’t solve. Typically, AI “succeeds” by defining the parts of the problem that are unsolved as not AI. The principal mechanism for this partitioning is abstraction. Its application is usually considered part of good science, and not (as it is in fact used in AI) as a mechanism for self-delusion. In AI, abstraction is usually used to factor out all aspects of perception and motor skills. I argue below that these are the hard problems solved by intelligent systems, and further that the shape of solutions to these problems constrains greatly the correct solutions of the small pieces of intelligence which remain.
人工智能的早期研究集中在游戏、几何问题、符号代数、定理证明和其他形式系统(参见 Feigenbaum 和 Feldman 的经典论文,1963 年;Minsky 的经典论文,1968 年)。在每种情况下,领域的语义都相当简单。
Early work in AI concentrated on games, geometrical problems, symbolic algebra, theorem proving, and other formal systems (see the classic papers in Feigenbaum and Feldman, 1963 and Minsky, 1968). In each case, the semantics of the domains were fairly simple.
20 世纪 60 年代末和 70 年代初,“积木世界”成为人工智能研究的热门领域。它具有统一而简单的语义。成功的关键是完整而明确地表示世界的状态。然后可以使用搜索技术在这个易于理解的世界中规划。学习也可以在积木世界中进行;只有少数几个简单的概念值得学习,可以通过枚举子表达式集来捕获它们,这些子表达式必须包含在包含概念实例的任何世界形式描述中。积木世界甚至被用于视觉研究和移动机器人,因为它对必要的感知处理提供了强有力的约束(例如,参见 Nilsson,1984 年)。
In the late sixties and early seventies, the “blocks world” became a popular domain for AI research. It had a uniform and simple semantics. The key to success was to represent the state of the world completely and explicitly. Search techniques could then be used for planning within this well-understood world. Learning could also be done within the blocks world; there were only a few simple concepts worth learning, and they could be captured by enumerating the set of subexpressions which must be contained in any formal description of a world containing an instance of the concept. The blocks world was even used for vision research and mobile robotics, as it provided strong constraints on the perceptual processing necessary (see, for instance, Nilsson, 1984).
最终,有人批评积木世界是一个“玩具世界”,其中只有简单的特殊用途的解决方案,而这些解决方案本应被视为更普遍的问题。与此同时,人工智能内部也出现了资金危机(美国和英国都是当时人工智能研究最活跃的地方)。人工智能研究人员发现自己被迫变得重要。他们进入了更复杂的领域,例如旅行计划、去餐馆、医疗诊断等。
Eventually, criticism surfaced that the blocks world was a “toy world” and that within it there were simple special purpose solutions to what should be considered more general problems. At the same time there was a funding crisis within AI (both in the US and the UK, the two most active places for AI research at the time). AI researchers found themselves forced to become relevant. They moved into more complex domains, such as trip planning, going to a restaurant, medical diagnosis, and such like.
很快出现了一个新的口号:“良好的表征是人工智能的关键”(如:概念上高效的程序,Bobrow 和 Brown,1975 年)。这个想法是,通过仅明确地表示相关事实,世界的语义(表面上非常复杂)再次简化为一个简单的剂量系统。抽象到仅相关的细节从而简化了问题。
Soon there was a new slogan: “Good representation is the key to AI” (as in: conceptually efficient programs, Bobrow and Brown, 1975). The idea was that by representing only the pertinent facts explicitly, the semantics of a world (which on the surface was quite complex) were reduced to a simple dosed system once again. Abstraction to only the relevant details thus simplified the problems.
以椅子为例。虽然这两种描述都是正确的,
Consider chairs, for example. While these two characterizations are true,
(CAN(可坐人椅))
(CAN(站立式椅子))
(CAN (SIT-ON PERSON CHAIR)), and
(CAN (STAND-ON PERSON CHAIR)),
椅子的概念其实远不止这些。椅子可能有一些平坦的坐位,也许还有靠背。它们有各种可能的尺寸、强度要求和各种形状。它们通常有某种覆盖材料——除非它们是由木头、金属或塑料制成的。它们有时在某些特定的地方是柔软的。它们可以来自各种可能的款式。总之,椅子的概念很难简单地描述。当然,没有一个人工智能视觉程序可以在任意图像中找到任意的椅子;它们最多只能在精心挑选的图像中找到一种特定类型的椅子。
there is really much more to the concept of a chair. Chairs have some flat (maybe) sitting place, with perhaps a back support. They have a range of possible sizes, requirements on strength, and a range of possibilities in shape. They often have some sort of covering material—unless they are made of wood, metal or plastic. They sometimes are soft in particular places. They can come from a range of possible styles. In sum, the concept of what is a chair is hard to characterize simply. There is certainly no AI vision program that can find arbitrary chairs in arbitrary images; they can at best find one particular type of chair in carefully selected images.
然而,这种描述也许是解决某些问题的正确 AI 表示,例如,一个饥饿的人坐在房间里的椅子上,可以看到一根香蕉挂在天花板上,触手可及。这些问题永远不会通过向 AI 系统展示场景照片来提出。一个人(即使是一个小孩)可以对照片做出正确的解读并提出行动计划。然而,对于 AI 规划系统,实验者需要抽象出大部分细节,以形成一个简单的描述,即人、椅子和香蕉等原子概念。
This characterization, however, is perhaps the correct AI representation for solving certain problems—for instance, one in which a hungry person sitting on a chair in a room can see a banana hanging from the ceiling just out of reach. Such problems are never posed to AI systems by showing them a photo of the scene. A person (even a young child) can make the right interpretation of the photo and suggest a plan of action. For AI planning systems, however, the experimenter is required to abstract away most of the details to form a simple description in terms of atomic concepts such as PERSON, CHAIR and BANANA.
但抽象过程是智能的本质,也是解决问题的难点。在目前的方案下,抽象工作由研究人员完成,人工智能程序几乎无事可做,只能搜索。真正智能的程序会研究照片,自己进行抽象,然后解决问题。
But this abstraction process is the essence of intelligence and the hard part of the problems being solved. Under the current scheme, the abstraction is done by the researchers, leaving little for the AI programs to do but search. A truly intelligent program would study the photograph, perform the abstraction itself, and solve the problem.
大多数人工智能程序的唯一输入是人类从真实数据中推断出的一组有限的简单断言。识别、空间理解、处理传感器噪声、部分模型等问题都被忽略了。这些问题被归入输入黑匣子的范围。心理物理证据表明,它们都与智能系统使用的世界表征密切相关。
The only input to most AI programs is a restricted set of simple assertions deduced from the real data by humans. The problems of recognition, spatial understanding, dealing with sensor noise, partial models, and the like, are all ignored. These problems are relegated to the realm of input black boxes. Psychophysical evidence suggests they are all intimately tied up with the representation of the world used by an intelligent system.
在现实世界中,感知(抽象)和推理之间没有明确的界限。当前人工智能系统的脆弱性证明了这一事实。例如,MYCIN(Shortliffe,1976)是诊断人类细菌感染的专家;但它实际上没有关于人类(或任何生物)是什么、它们如何工作、或者可能发生在人类身上的事情的模型。如果被告知主动脉破裂并且患者每分钟以一品脱的速度失血,MYCIN 将尝试找出问题的细菌原因。
There is no clean division between perception (abstraction) and reasoning in the real world. The brittleness of current AI systems attests to this fact. For example, MYCIN (Shortliffe, 1976) is an expert at diagnosing human bacterial infections; but it really has no model of what a human (or any living creature) is or how they work, or what are plausible things to happen to a human. If told that the aorta is ruptured and the patient is losing blood at the rate of a pint every minute, MYCIN will try to find a bacterial cause of the problem.
因此,由于我们仍然为程序执行所有抽象,因此大多数人工智能工作仍在相当于积木的世界中完成。现在积木的形状和颜色略有不同,但它们的底层语义并没有发生很大变化。
Thus, because we still perform all the abstractions for our programs, most AI work is still done in the equivalent of the blocks world. Now the blocks are slightly different shapes and colors, but their underlying semantics have not changed greatly.
可以说,对人工智能程序进行这种抽象(感知)仅仅是所有优秀科学中常见的正常的还原性抽象用法。抽象会减少输入数据,以便程序能够体验与人类相同的“感知世界”(von Uexküll, 1921 称之为Merkwelt)。其他(视觉)研究人员将在其他时间和地点独立填写细节。我基于两点反对这种做法。首先,正如 von Uexküll 和其他人所指出的那样,每种动物以及每种具有自己独特的非人类传感器套件的机器人都有自己不同的Merkwelt。其次,我们人类为程序提供的Merkwelt是基于我们自己的内省。这种Merkwelt是否与我们内部实际使用的东西有任何相似之处尚不清楚——它可能只是一种用于通信目的的输出编码(因此,大多数人一生中都没有意识到他们的视野中心几乎有一个很大的盲点)。
It could be argued that performing this abstraction (perception) for AI programs is merely the normal reductionist use of abstraction common in all good science. The abstraction reduces the input data so that the program experiences the same “perceptual world” (what von Uexküll, 1921 called a Merkwelt) as humans. Other (vision) researchers will independently fill in the details at some other time and place. I object to this on two grounds. First, as von Uexküll and others have pointed out, each animal species, and clearly each robot species with its own distinctly nonhuman sensor suites, will have its own different Merkwelt. Second, the Merkwelt we humans provide our programs is based on our own introspection. It is by no means clear that such a Merkwelt is anything like what we actually use internally—it could just as easily be an output coding for communication purposes (thus, most humans go through life never realizing they have a large blind spot almost in the center of their visual fields).
第一个反对意见警告说,当使用真实的传感器和感知处理时,为人类假设的默克世界而开发的推理策略可能无效。第二个反对意见说,即使有了人类的传感器和感知,默克世界也可能与人类使用的完全不同。事实上,我们对内部表征的内省描述可能完全具有误导性,与我们实际使用的完全不同。
The first objection warns of the danger that reasoning strategies developed for the human-assumed Merkwelt may not be valid when real sensors and perceptual processing are used. The second objection says that, even with human sensors and perception, the Merkwelt may not be anything like that used by humans. In fact, it may be the case that our introspective descriptions of our internal representations are completely misleading and quite different from what we really use.
一个持续的故事
A continuing story
与此同时,我们 19 世纪 90 年代的朋友们正忙于他们的自动对焦机器。他们一致认为,这个项目太大了,无法作为一个整体来开展,他们需要成为不同领域的专家。毕竟,他们曾向同机的乘客询问过问题,并发现波音公司雇用了 6000 多人来制造这种飞机。
Meanwhile our friends in the 1890’s are busy at work on their AF machine. They have come to agree that the project is too big to be worked on as a single entity and that they will need to become specialists in different areas. After all, they had asked questions of fellow passengers on their flight and discovered that the Boeing Co. employed over 6000 people to build such an airplane.
每个人都很忙,但团队之间并没有太多交流。制造乘客座椅的人使用最好的实心钢作为框架。有人嘀咕说也许他们应该使用管状钢来减轻重量,但普遍的共识是,如果这种明显又大又重的飞机可以飞,那么重量显然没有问题。
Everyone is busy, but there is not a lot of communication between the groups. The people making the passenger seats used the finest solid steel available as the framework. There was some muttering that perhaps they should use tubular steel to save weight, but the general consensus was that if such an obviously big and heavy airplane could fly then clearly there was no problem with weight.
在观察飞行中,最初的小组成员没有一个人能看到驾驶座,但他们经过一番深思熟虑,相信他们已经确定了驾驶座应该有什么以及应该如何工作的主要限制。飞行员(他将被称为飞行员)坐在玻璃地板上方的座位上,这样他就能看见下面的地面,从而知道在哪里降落。飞机上有一些后视镜,这样他就能观察后面是否有其他飞机接近。他的控制装置包括一个控制速度的脚踏板(就像这些开始出现的新式汽车一样)和一个左右转向的方向盘。此外,方向盘杆可以向前和向后推,使飞机上升和下降。巧妙布置的管道可以测量飞机的空速并将其显示在表盘上。还有什么可要求的呢?哦,是的。窗户上有一个相当不错的百叶窗设置,这样驾驶员就可以呼吸到新鲜空气,而不会被风吹到脸上。
有趣的是,现在所有的研究人员都放弃了空气动力学研究。他们中的一些人曾就这个问题向同行的乘客进行过激烈的询问,但没有一位现代飞行员对此有所了解。显然,空军研究人员此前一直在浪费时间进行研究。
On their observation flight, none of the original group managed a glimpse of the driver’s seat, but they have done some hard thinking and believe they have established the major constraints on what should be there and how it should work. The pilot, as he will be called, sits in a seat above a glass floor so that he can see the ground below so he will know where to land. There are some side mirrors so he can watch behind for other approaching airplanes. His controls consist of a foot pedal to control speed (just as in these new fangled automobiles that are starting to appear), and a steering wheel to turn left and right. In addition the wheel stem can be pushed forward and back to make the airplane go up and down. A clever arrangement of pipes measures airspeed of the airplane and displays it on a dial. What more could one want? Oh yes. There’s a rather nice setup of louvers in the windows so that the driver can get fresh air without getting the full blast of the wind in his face.
An interesting sidelight is that all the researchers have by now abandoned the study of aerodynamics. Some of them had intensely questioned their fellow passengers on this subject and not one of the modern flyers had known a thing about it. Clearly the AF researchers had previously been wasting their time in its pursuit.
我希望构建完全自主的移动代理,它们可以与人类共存,而人类则视它们为具有自身智慧的生物。我将这样的代理称为生物。这就是我的智识动机。我对展示人类如何工作没有特别的兴趣 — — 尽管人类和其他动物一样,是此项努力中有趣的研究对象,因为他们是成功的自主代理。我对应用没有特别的兴趣;在我看来,很明显,如果我的目标能够实现,那么此类生物的应用范围将仅受限于我们(或它们)的想象力。我对生物的哲学含义没有特别的兴趣,尽管显然它们会有重大含义。
I wish to build completely autonomous mobile agents that co-exist in the world with humans, and are seen by those humans as intelligent beings in their own right. I will call such agents Creatures. This is my intellectual motivation. I have no particular interest in demonstrating how human beings work—although humans, like other animals, are interesting objects of study in this endeavor, inasmuch as they are successful autonomous agents. I have no particular interest in applications; it seems clear to me that, if my goals can be met, then the range of applications for such Creatures will be limited only by our (or their) imagination. I have no particular interest in the philosophical implications of Creatures, although clearly there will be significant implications.
考虑到前两节的警告,并考虑到 AF 研究人员的寓言,我确信我必须谨慎行事,以避免一些令人讨厌的陷阱。
Given the caveats of the previous two sections, and considering the parable of the AF researchers, I am convinced that I must tread carefully in this endeavor to avoid some nasty pitfalls.
那么,暂时将建造 Creatures 的问题视为一个工程问题。我们将开发一种建造 Creatures 的工程方法。
For the moment then, consider the problem of building Creatures as an engineering problem. We will develop an engineering methodology for building Creatures.
首先,让我们考虑一下我们对生物的一些要求。
First, let us consider some of the requirements for our Creatures.
现在,让我们考虑一些有效的工程方法来实现这些要求。与所有工程工作一样,需要将复杂系统分解成各个部分,构建各个部分,然后将它们连接起来形成一个完整的系统。
Now, let us consider some of the valid engineering approaches to achieving these requirements. As in all engineering endeavors, it is necessary to decompose a complex system into parts, build the parts, and then interface them into a complete system.
智能系统最强大的传统概念(至少在人工智能工作者中是隐含的)可能是中央系统,其中感知模块作为输入,动作模块作为输出。感知模块提供对世界的符号描述,动作模块采用所需动作的符号描述并确保它们在世界中发生。中央系统是一个符号信息处理器。
Perhaps the strongest traditional notion of intelligent systems (at least implicitly among AI workers) has been of a central system, with perceptual modules as inputs and action modules as outputs. The perceptual modules deliver a symbolic description of the world and the action modules take a symbolic description of desired actions and make sure they happen in the world. The central system then is a symbolic information processor.
传统上,感知(视觉是最常见的感知形式)和中央系统方面的工作是由不同的研究人员甚至完全不同的研究实验室完成的。视觉工作者也未能免受早期对人工智能工作者的批评。大多数视觉研究都表现为从一种图像表示(例如原始灰度图像)到另一种配准图像(例如边缘图像)的转换。人工智能和视觉这两个研究小组都对符号界面的形状做出了假设。几乎没有人将视觉系统连接到智能中央系统。因此,独立研究人员做出的假设并不一定是现实的。压力确实存在,要求严格限制正在进行的特定研究。
Traditionally, work in perception (and vision is the most commonly studied form of perception) and work in central systems has been done by different researchers and even totally different research laboratories. Vision workers are not immune to earlier criticisms of AI workers. Most vision research is presented as a transformation from one image representation (such as a raw grey-scale image) to another registered image (such as an edge image). Each group, AI and vision, makes assumptions about the shape of the symbolic interfaces. Hardly anyone has ever connected a vision system to an intelligent central system. Thus the assumptions independent researchers make are not forced to be realistic. There is a real danger from pressures to neatly circumscribe the particular piece of research being done.
中央系统也必须分解成更小的部分。我们看到人工智能的子领域有“知识表示”、“学习”、“规划”、“定性推理”等。这些模块之间的接口也容易受到智力滥用。
The central system must also be decomposed into smaller pieces. We see subfields of artificial intelligence such as “knowledge representation”, “learning”, “planning”, “qualitative reasoning”, etc. The interfaces between these modules are also subject to intellectual abuse.
当研究特定模块的研究人员能够选择指定模块要求的输入和输出时,我相信他们所做的工作几乎没有机会融入完整的智能系统。
When researchers working on a particular module get to choose both the inputs and the outputs that specify the module requirements, I believe there is little chance the work they do will fit into a complete intelligent system.
功能分解方法中的这个错误很难修复。需要一长串模块来将感知与行动联系起来。为了测试其中任何一个,必须先构建它们。但在构建实际模块之前,我们不太可能准确预测需要哪些模块或它们需要哪些接口。
This bug in the functional decomposition approach is hard to fix. One needs a long chain of modules to connect perception to action. In order to test any of them, they all must first be built. But until realistic modules are built, it is highly unlikely that we can predict exactly what modules will be needed or what interfaces they will need.
另一种分解方法不区分外围系统(如视觉)和中央系统。相反,智能系统的基本分解是在正交方向上的,将其划分为产生活动的子系统。每个活动或产生行为的系统都单独将感知与行动联系起来。我们将产生活动的系统称为一个层。活动是与世界互动的一种模式。活动的另一个名称可能是技能——因为每项活动至少在事后都可以合理化为追求某种目的。然而,我们之所以选择“活动”这个词,是因为我们的层必须决定何时为自己行动——而不是成为某个在某个层的召唤下被调用的子程序。我们将这样可分解为活动或产生行为层的生物称为基于行为的系统。
An alternative decomposition makes no distinction between peripheral systems, such as vision, and central systems. Rather, the fundamental slicing up of an intelligent system is in the orthogonal direction, dividing it into activity producing subsystems. Each activity, or behavior-producing system, individually connects sensing to action. We refer to an activity producing system as a layer. An activity is a pattern of interactions with the world. Another name for our activities might well be skills —since each activity can, at least post facto, be rationalized as pursuing some purpose. We have chosen the word ‘activity’, however, because our layers must decide when to act for themselves—not be some subroutine to be invoked at the beck and call of some other layer. We call Creatures that are decomposable into activities or behavior-producing layers in this way behavior-based systems.
这种方法的优点是它提供了一条从非常简单的系统到复杂自主智能系统的渐进路径。在每一步中,只需要构建一个小块,并将其与现有的、可运行的、完整的智能进行接口。
The advantage of this approach is that it gives an incremental path from very simple systems to complex autonomous intelligent systems. At each step of the way, it is only necessary to build one small piece, and interface it to an existing, working, complete intelligence.
这个想法是先构建一个非常简单的完整自主系统,然后在现实世界中对其进行测试。我们最喜欢的系统示例是 Creature,实际上是一个移动机器人,它会避免撞到物体。它会感知周围的物体并远离它们,如果它感知到路径上有东西就会停下来。仍然需要通过将其分解成几个部分来构建这个系统,但不需要明确区分“感知系统”、“中央系统”和“行动系统”。事实上,可能有两个独立的通道将感知连接到行动——一个用于启动运动,一个用于紧急停止——因此,“感知”并不存在一个可以以传统意义上的方式传递世界表述的单一地方。
The idea is to build first a very simple complete autonomous system, and test it in the real world. Our favorite example of such a system is a Creature, actually a mobile robot, which avoids hitting things. It senses objects in its immediate vicinity and moves away from them, halting if it senses something in its path. It is still necessary to build this system by decomposing it into parts, but there need be no clear distinction between a “perception system,” a “central system” and an “action system”. In fact, there may well be two independent channels connecting sensing to action—one for initiating motion, and one for emergency halts—so there is no single place where “perception” delivers a representation of the world in the traditional sense.
接下来,我们构建一个与第一个系统并行运行的增量智能层。它被粘贴到现有的调试系统上,并在现实世界中再次进行测试。这个新层可能会直接访问传感器并对传递的数据运行不同的算法。第一级自主系统继续并行运行,并且不知道第二级的存在。例如,在 Brooks (1986) 中,我们报告了构建第一层控制,让 Creature 避开物体,然后添加一层,灌输尝试访问远处可见位置的活动。第二层向第一层的运动控制部分注入命令,引导机器人朝着目标前进;但是,独立地,第一层会使机器人避开之前未见过的障碍物。第二层监控 Creature 的进度并发送更新的运动命令,从而实现其目标,而无需明确意识到障碍物,这些障碍物已由较低级别的控制处理。
Next we build an incremental layer of intelligence which operates in parallel to the first system. It is pasted onto the existing debugged system and tested again in the real world. This new layer might directly access the sensors and run a different algorithm on the delivered data. The first-level autonomous system continues to run in parallel, and unaware of the existence of the second level. For example, in Brooks (1986) we reported on building a first layer of control which let the Creature avoid objects, and then adding a layer which instilled an activity of trying to visit distant visible places. The second layer injected commands to the motor control part of the first layer, directing the robot towards the goal; but, independently, the first layer would cause the robot to veer away from previously unseen obstacles. The second layer monitored the progress of the Creature and sent updated motor commands, thus achieving its goal without being explicitly aware of obstacles, which had been handled by the lower level of control.
有了多层结构,感知提供对世界的描述这一概念就变得更加模糊,因为系统中进行感知的部分分散在许多部分上,这些部分并没有特别通过数据路径连接或通过功能相关。当然,没有可以识别的感知“输出”位置。此外,完全不同类型的传感器数据处理独立且并行进行,每种处理都通过完全不同的控制渠道影响整个系统活动。
With multiple layers, the notion of perception delivering a description of the world gets blurred even more, as the part of the system doing perception is spread out over many pieces which are not particularly connected by data paths or related by function. Certainly there is no identifiable place where the “output” of perception can be found. Furthermore, totally different sorts of processing of the sensor data proceed independently and in parallel, each affecting the overall system activity through quite different channels of control.
事实上,不是通过设计而是通过观察,我们注意到,我们的分层和分布式方法帮助我们的生物实现目标的方式的共同主题是没有中央表示。
In fact, not by design but rather by observation, we note that a common theme in the ways in which our layered and distributed approach helps our Creatures meet our goals is that there is no central representation.
正如没有中央表征一样,甚至没有中央系统。每个产生活动的层都将感知与行动直接联系起来。只有生物的观察者才会赋予中央表征或中央控制。生物本身没有中央表征或中央控制;它是相互竞争的行为的集合。在观察者眼中,从它们相互作用的局部混乱中,出现了一种连贯的行为模式。没有中央的、有目的的控制点。(明斯基 1986 年对人类行为的产生方式给出了类似的解释。)请注意,我们并不是声称混乱是智能行为的必要因素。事实上,我们主张对系统内的所有相互作用进行精心设计(进化拥有令人难以置信的漫长时间尺度和大量单独实验,因此也许可以不进行这种精心设计)。
Just as there is no central representation, there is not even a central system. Each activity-producing layer connects perception to action directly. It is only the observer of the Creature who imputes a central representation or central control. The Creature itself has none; it is a collection of competing behaviors. Out of the local chaos of their interactions, there emerges, in the eye of an observer, a coherent pattern of behavior. There’s no central, purposeful locus of control. (Minsky 1986 gives a similar account of how human behavior is generated.) Note carefully that we are not claiming that chaos is a necessary ingredient of intelligent behavior. Indeed, we advocate careful engineering of all the interactions within the system (evolution had the luxury of incredibly long time scales and enormous numbers of individual experiments, and thus perhaps was able to do without this careful engineering).
然而,我们确实声称,无需明确表示世界或系统意图即可为生物产生智能行为。如果没有这样的明确表示,从局部来看,交互确实可能看起来混乱且没有目的。
We do claim, however, that there need be no explicit representation of either the world or the intentions of the system to generate intelligent behaviors for a Creature. Without such explicit representations, and when viewed locally, the interactions may indeed seem chaotic and without purpose.
然而,我认为不止这些。即使在局部层面,我们也没有传统的人工智能表征。我们从不使用可以附加任何语义的标记。在我们的实现中,最好的情况是将数字从一个进程传递到另一个进程。但只有通过查看第一个和第二个进程的状态,才能对该数字进行任何解释。极端主义者可能会说我们确实有表征,但它们只是隐式的。通过将完整系统及其状态适当映射到另一个域,我们可以定义这些数字和进程之间的拓扑连接以某种方式编码的表征。
I claim there is more than this, however. Even at a local level, we do not have traditional AI representations. We never use tokens which have any semantics that can be attached to them. The best that can be said in our implementations is that a number is passed from one process to another. But it is only by looking at the state of both the first and second processes that that number can be given any interpretation at all. An extremist might say that we really do have representations, but they are just implicit. With an appropriate mapping of the complete system and its state to another domain, we could define representations that these numbers and topological connections between processes somehow encode.
然而,我们并不愿意将这些东西称为“表示”。它们与标准表示有太多不同。
However we are not happy with calling such things representations. They differ from standard representations in too many ways.
在推理过程中,没有需要实例化的变量。(有关这方面的更详尽论述,请参阅 Agre 和 Chapman,1987 年。)没有需要通过模式匹配来选择的规则。没有需要做出的选择。在很大程度上,世界的状态决定了生物的行为。Simon(1969 年)指出,系统行为的复杂性不一定是生物的复杂性所固有的,但可能存在于环境的复杂性中。他在描述一只在海滩上游荡的蚂蚁时进行了这种分析,但在下一段谈论人类时忽略了其含义。我们假设(遵循 Agre 和 Chapman)即使是人类层面的许多活动也同样是通过非常简单的机制对世界的反映,没有详细的表示。
There are no variables that need instantiation in reasoning processes. (See Agre and Chapman, 1987 for a more thorough treatment of this.) There are no rules that need to be selected through pattern matching. There are no choices to be made. To a large extent, the state of the world determines the action of the Creature. Simon (1969) noted that the complexity of behavior of a system was not necessarily inherent in the complexity of the Creature, but perhaps in the complexity of the environment. He made this analysis in his description of an ant wandering the beach, but ignored its implications in the next paragraph when he talked about humans. We hypothesize (following Agre and Chapman) that much of even human-level activity is similarly a reflection of the world through very simple mechanisms without detailed representations.
为了构建基于活动分解的系统并使其真正强大,我们必须严格遵循严谨的方法。
In order to build systems based on an activity decomposition so that they are truly robust, we must rigorously follow a careful methodology.
首先,在现实世界中测试我们构建的 Creatures 至关重要——现实世界就是我们人类居住的世界。即使我们本着以后将活动转移到非简化世界的良好意图,但首先在简化的世界中测试它们也是灾难性的。在简化的世界中(哑光涂漆的墙壁、到处都是矩形顶点、彩色块作为唯一的障碍),很容易构建一个偶然依赖于某些简化属性的系统子模块。这种依赖性很容易反映在该子模块与其他子模块之间的接口要求中。疾病蔓延,整个系统以一种微妙的方式依赖于简化的世界。当需要转移到非简化的世界时,我们逐渐痛苦地意识到系统的每个部分都必须重建。更糟糕的是,我们可能需要重新考虑整体设计,因为问题可能会完全改变。我们并不太担心首先测试简化的生物,然后再添加更复杂的控制层可能会有危险,因为使用这种方法进化已经取得了成功。
First, it is vitally important to test the Creatures we build in the real world—the same world that we humans inhabit. It is disastrous to fall into the temptation of testing them in a simplified world first, even with the best intentions of later transferring activity to an unsimplified world. With a simplified world (matte painted walls, rectangular vertices everywhere, colored blocks as the only obstacles) it is very easy to build a submodule of the system that happens accidentally to rely on some of those simplified properties. This reliance can then easily be reflected in the requirements on the interfaces between that submodule and others. The disease spreads and the complete system depends in a subtle way on the simplified world. When it comes time to move to the unsimplified world, we gradually and painfully realize that every piece of the system must be rebuilt. Worse than that, we may need to rethink the total design, as the issues may change completely. We are not so concerned that it might be dangerous to test simplified Creatures first, and later add more sophisticated layers of control, because evolution has been successful using this approach.
其次,在构建每一层时,都必须在现实世界中进行广泛的测试。系统必须与现实世界进行长时间的交互。必须观察其行为并仔细彻底地调试。当将第二层添加到现有层时,有三个潜在的错误来源:第一层、第二层以及两层的交互。消除第一个错误来源的可能性会使查找错误变得容易得多。此外,只剩下一个可以改变以修复错误的事物——第二层。
Second, as each layer is built, it must be tested extensively in the real world. The system must interact with the real world over extended periods. Its behavior must be observed and be carefully and thoroughly debugged. When a second layer is added to an existing layer, there are three potential sources of bugs: the first layer, the second layer, and the interaction of the two layers. Eliminating the first of these sources of bugs as a possibility makes finding bugs much easier. Furthermore, there remains only one thing that it is possible to vary in order to fix the bugs—the second layer.
我们现在已经基于任务分解方法构建了一系列机器人。它们都在不受约束的动态世界中运行(麻省理工学院人工智能实验室的实验室和办公区域)。它们在人们走过、人们故意试图迷惑它们以及人们只是站在周围看着它们的情况下成功运行。所有这些机器人都是生物,因为它们在通电后存在于世界中并与之互动,追求由其控制层实施不同活动确定的多个目标。这与其他移动机器人形成了鲜明对比,后者被赋予了执行特定任务的程序或计划。
We have now built a series of robots based on the methodology of task decomposition. They all operate in an unconstrained dynamic world (laboratory and office areas in the MIT Artificial Intelligence Laboratory). They successfully operate with people walking by, people delibrately trying to confuse them, and people just standing around watching them. All these robots are Creatures in the sense that, on power-up, they exist in the world and interact with it, pursuing multiple goals determined by their control layers implementing different activities. This is in contrast to other mobile robots that are given programs or plans to follow for a specific mission.
我们的第一个机器人名叫Allen ,如图 23.1所示。Allen 使用外接LISP机器进行大部分计算。Allen 实现了我们称之为包容架构的抽象架构,体现了分解为任务实现行为层和通过在现实世界中调试进行增量组合的基本思想。(有关此实现和其他实现的详细信息,请参阅 Brooks,1987。)
Our first robot, named Allen, is shown in figure 23.1. Allen uses an offboard LISP machine for most of its computations. Allen implements the abstract architecture that we call the subsumption architecture, embodying the fundamental ideas of decomposition into layers of task-achieving behaviors, and incremental composition through debugging in the real world. (Details of this and other implementations can be found in Brooks, 1987.)
图23.1
这是我们建造的第一个机器人,叫Allen。
Figure 23.1
This is the first robot we built, called Allen.
包容架构中的每一层都由简单有限状态机的固定拓扑网络组成。每个有限状态机都有一些状态、一个或两个内部寄存器、一个或两个内部计时器,以及对可以计算诸如矢量和之类的事物的简单计算机器的访问。有限状态机异步运行,通过线路发送和接收固定长度(在本例中为 24 位)的消息。对于 Allen 来说,这些都是虚拟线路;在我们后来的机器人上,我们使用物理线路来连接计算组件。
Each layer in the subsumption architecture is composed of a fixed-topology network of simple finite state machines. Each finite state machine has a handful of states, one or two internal registers, one or two internal timers, and access to simple computational machines which can compute things such as vector sums. The finite state machines run asynchronously, sending and receiving fixed-length (in this case, 24-bit) messages over wires. For Allen, these were virtual wires; on our later robots we have used physical wires to connect computational components.
没有中央控制点。相反,有限状态机由它们收到的消息数据驱动。消息的到达或指定时间段的到期会导致有限状态机改变状态。有限状态机可以访问消息的内容,并可能输出它们、使用谓词测试它们并有条件地分支到不同的状态,或者将它们传递给简单的计算元素。没有访问全局数据的可能性,也没有动态建立通信链接的可能性。因此,没有全局控制的可能性。所有有限状态机都是平等的,但同时它们也是固定拓扑连接的囚徒。
There is no central locus of control. Rather, the finite state machines are data-driven by the messages they receive. The arrival of messages or the expiration of designated time periods cause the finite state machines to change state. The finite state machines have access to the contents of the messages and might output them, test them with a predicate and conditionally branch to a different state, or pass them to simple computation elements. There is no possibility of access to global data, nor of dynamically established communications links. There is thus no possibility of global control. All finite state machines are equal, yet at the same time they are prisoners of their fixed-topology connections.
层通过我们称之为抑制(因此得名“包容架构”)和禁止的机制进行组合。在这两种情况下,随着新层的添加,其中一条新线路会侧接至现有线路。每个侧接都与一个预定义的时间常数相关联。在抑制的情况下,侧接发生在有限状态机的输入侧。如果消息到达新线路,则会将其定向到有限状态机的输入端口,就好像它到达了现有线路一样。此外,现有线路上的任何新消息都会在指定的时间段内受到抑制(即拒绝)。对于抑制,侧接发生在有限状态机的输出侧。新线路上的消息只是在指定的时间段内禁止在现有线路上发出消息。与抑制不同,新消息不会取而代之。
Layers are combined through mechanisms we call suppression (whence the name ‘subsumption architecture’) and inhibition. In both cases, as a new layer is added, one of the new wires is side-tapped into an existing wire. A predefined time constant is associated with each side-tap. In the case of suppression, the side-tapping occurs on the input side of a finite state machine. If a message arrives on the new wire, it is directed to the input port of the finite state machine as though it had arrived on the existing wire. Additionally any new messages on the existing wire are suppressed (that is, rejected) for the specified time period. For inhibition, the side-tapping occurs on the output side of a finite state machine. A message on the new wire simply inhibits messages being emitted on the existing wire for the specified time period. Unlike suppression, the new message is not delivered in their place.
举例来说,考虑图 23.2中的三个层。这些是我们在 Allen 上运行了一年多的三个控制层。机器人有一个由 12 个超声波声纳组成的环作为其主要传感器。每秒,这些声纳都会运行以提供十二次径向深度测量。由于许多物体都是声纳的镜像,因此声纳非常嘈杂。因此,由于低入射角(小于三十度)的表面掠过而导致的多次反射后,镜面反射和返回路径存在问题。
As an example, consider the three layers of figure 23.2. These are three layers of control that we have run on Allen for well over a year. The robot has a ring of 12 ultrasonic sonars as its primary sensors. Every second, these sonars are run to give twelve radial depth measurements. Sonar is extremely noisy due to many objects being mirrors to sonar. There are thus problems with specular reflection and return paths following multiple reflections due to surface skimming with low angles of incidence (less than thirty degrees).
图 23.2
我们将有限状态机连成控制层。每层都建立在现有层之上。较低层永远不会依赖于较高层的存在。(这是艾伦。)
Figure 23.2
We wire finite state machines together into layers of control. Each layer is built on top of existing layers. Lower layers never rely on the existence of higher-level layers. (This is Allen.)
更详细地说,这三个层的工作原理如下:
In more detail the three layers work as follows:
1. 最低层实现一种行为,使机器人(生物的物理体现)避免撞击物体。它避开静态物体和移动物体——即使是那些主动攻击它的物体。标记为声纳的有限状态机只是运行声纳设备,每秒发出一个瞬时地图,读数转换为极坐标。该地图被传递给碰撞和感觉力有限状态机。其中第一个只是观察前方是否有任何东西,如果有,则向负责让机器人向前运行的有限状态机发送停止消息。(如果该有限状态机处于不正确的状态,该消息可能会被忽略。)同时,另一个有限状态机根据平方反比定律计算机器人上的排斥力,其中每个声纳返回都被视为指示排斥物体的存在。所有声纳的贡献被矢量相加以产生作用于机器人的总力。输出被传送到失控机器,失控机器对其进行阈值处理并将其传送到转弯机器,转弯机器将机器人直接定向远离总排斥力。最后,前进机器驱动机器人前进。每当这台机器在机器人前进时收到停止消息时,它就会命令机器人停止。
1. The lowest-level layer implements a behavior which makes the robot (the physical embodiment of the Creature) avoid hitting objects. It avoids both static objects and moving objects—even those that are actively attacking it. The finite state machine labelled sonar simply runs the sonar devices and every second emits an instantaneous map with the readings converted to polar coordinates. This map is passed on to the collide and feelforce finite state machines. The first of these simply watches to see of there is anything dead ahead, and if so sends a halt message to the finite state machine in charge of running the robot forwards. (If that finite state machine is not in the correct state the message may well be ignored.) Simultaneously, the other finite state machine computes a repulsive force on the robot, based on an inversesquare law, where each sonar return is considered to indicate the presence of a repulsive object. The contributions from all the sonars are vector-added to produce an overall force acting on the robot. The output is passed to the runaway machine, which thresholds it and passes it on to the turn machine, which orients the robot directly away from the summed repulsive force. Finally the forward machine drives the robot forward. Whenever this machine receives a halt message while the robot is driving forward, it commands the robot to halt.
这个有限状态机网络会产生让机器人避开物体的行为。如果它从空房间的中间开始,它就会呆在那里。如果有人走到它面前,机器人就会走开。如果它朝其他障碍物的方向移动,它就会停下来。总的来说,它设法在动态环境中生存,而不会撞到物体或被物体击中。
This network of finite state machines generates behaviors which let the robot avoid objects. If it starts in the middle of an empty room it simply sits there. If someone walks up to it, the robot moves away. If it moves in the direction of other obstacles it halts. Overall, it manages to exist in a dynamic environment without hitting or being hit by objects.
2. 下一层使机器人在不忙于避开物体时四处游荡。游荡有限状态机每十秒左右为机器人生成一个随机航向。避让机将该航向视为吸引力,并将其与从声纳计算出的排斥力相加。它使用结果来抑制较低级别的行为,迫使机器人向接近游荡决定的方向移动,但同时避开任何障碍物。请注意,如果转弯和前进有限状态机忙于运行机器人,则新的游荡冲动将被忽略。
2. The next layer makes the robot wander about, when not busy avoiding objects. The wander finite state machine generates a random heading for the robot every ten seconds or so. The avoid machine treats that heading as an attractive force and sums it with the repulsive force computed from the sonars. It uses the result to suppress the lower-level behavior, forcing the robot to move in a direction close to what wander decided but at the same time avoiding any obstacles. Note that if the turn and forward finite state machines are busy running the robot, the new impulse to wander will be ignored.
3. 第三层让机器人尝试探索。它会寻找远处,然后尝试到达那里。这一层会抑制漫游层,并观察底层如何因障碍物(可能是动态障碍物)而使机器人偏离路线。它会纠正任何偏离,然后机器人就会实现目标。
3. The third layer makes the robot try to explore. It looks for distant places, then tries to reach them. This layer suppresses the wander layer, and observes how the bottom layer diverts the robot due to obstacles (perhaps dynamic). It corrects for any divergences, and the robot achieves the goal.
whenlook有限状态机注意到机器人何时不忙于移动,并启动自由空间查找器(在图中标记为立体声)有限状态机。同时,它会抑制徘徊行为,以使观察结果保持有效。当观察到一条路径时,它会被发送到pathplan有限状态机,后者将命令的方向注入到Avoid有限状态机。通过这种方式,低级避障功能继续发挥作用。这可能会导致机器人朝着与pathplan所期望的方向不同的方向前进。因此,机器人的实际路径由integration有限状态机监控,后者将更新后的估计值发送到pathplan机。然后,该机器充当差分引擎,迫使机器人朝着期望的方向前进,并在机器人避开障碍物时补偿机器人的实际路径。
The whenlook finite state machine notices when the robot is not busy moving, and starts up the free space finder (labelled stereo in the diagram) finite state machine. At the same time it inhibits wandering behavior so that the observation will remain valid. When a path is observed it is sent to the pathplan finite state machine, which injects a commanded direction to the avoid finite state machine. In this way lower-level obstacle avoidance continues to function. This may cause the robot to go in a direction different from that desired by pathplan. For that reason, the actual path of the robot is monitored by the integrate finite state machine, which sends updated estimates to the pathplan machine. This machine then acts as a difference engine, forcing the robot in the desired direction and compensating for the actual path of the robot as it avoids obstacles.
这些只是首次在 Allen 上实现的特定层。(有关更多详细信息,请参阅 Brooks,1986;Brooks 和 Connell (1986) 报告了在该特定机器人上实现的另外三个层。)
These are just the particular layers that were first implemented on Allen. (See Brooks, 1986 for more details; Brooks and Connell (1986) report on another three layers implemented on that particular robot.)
Allen 的最低层完全是反应性的:它只是避免碰撞。但它的下两层,即漫游和探索,并不是完全反应性的。我们的第二个生物,一个名为Herbert 的移动机器人(Connell,1989),是一个更加雄心勃勃的项目,并将反应性的概念(如 Allen 的最低层)推向了更远。
Allen’s lowest layer was entirely reactive: it merely avoided collisions. But its next two layers, wander and explore, were not entirely reactive. Our second Creature, a mobile robot named Herbert (Connell, 1989), was a much more ambitious project, and pushed the idea of reactivity—as in Allen’s lowest layer—much further.
Herbert(如图 23.3所示)使用 30 个红外近距离传感器沿墙壁和门道导航,使用磁罗盘保持全局方向感,使用激光扫描仪通过视觉找到类似汽水罐的物体,并在手臂上安装了大量传感器,这些传感器具有一组 15 种行为,这些行为足以可靠地定位和拾起汽水罐。Herbert 的任务是在人们的办公室里四处走动寻找汽水罐,捡起一个,然后将其带回机器人出发的地方。Herbert 确实成功完成了这项任务(尽管其机载芯片安装位置的机械故障将可靠运行限制在每次约 15 分钟内)。
Herbert (shown in figure 23.3) used thirty infrared proximity sensors to navigate along walls and through doorways, a magnetic compass to maintain a global sense of direction, a laser scanner to find soda-can-like objects visually, and a host of sensors on an arm with a set of fifteen behaviors which, together, were sufficient to locate and pick up soda cans reliably. Herbert’s task was to wander around people’s offices looking for soda cans, pick one up, and bring it back to where the robot had started from. Herbert did succeed at this task (although mechanical failures in the seating of its onboard chips limited reliable operation to about fifteen minutes at a time).
图 23.3
这是 Herbert,一个比 Allen 更有野心的机器人。
Figure 23.3
This is Herbert, a more ambitious robot than Allen.
在对 Herbert 进行编程时,决定其内部状态的维持时间不应超过三秒,行为生成模块之间不应存在内部通信。每个模块都连接到输入端的传感器和输出端的固定优先级仲裁网络。仲裁网络驱动执行器。
In programming Herbert, it was decided that it should maintain no internal state longer than three seconds, and that there would be no internal communication between behavior generating modules. Each one was connected to sensors on the input side, and a fixed-priority arbitration network on the output side. The arbitration network drove the actuators.
由于 Herbert 几乎不保存任何内部状态(几乎不保存任何记忆),它常常不得不依赖世界本身作为其唯一可用的世界“模型”。此外,世界本身是 Herbert 各个模块之间唯一有效的通信媒介。例如,基于激光的汽水罐探测器驱动机器人,使其手臂排列在汽水罐前面。但它并没有告诉手臂控制器现在有一个汽水罐可以拿起了。相反,手臂行为会监控车轮上的轴编码器,当它们注意到没有身体运动时,就会启动手臂运动,这反过来又会触发其他行为,最终,机器人会拿起汽水罐。
Since Herbert maintained hardly any internal state—hardly any memory—it often had to rely on the world itself as its only available “model” of the world. Further, the world itself was the only effective medium of communication between Herbert’s separate modules. The laser-based soda-can finder, for example, drove the robot so that its arm was lined up in front of the soda can. But it did not tell the arm controller that there was now a soda can ready to be picked up. Rather, the arm behaviors monitored the shaft encoders on the wheels, and, when they noticed that there was no body motion, initiated motions of the arm—which, in turn, triggered other behaviors such that, eventually, the robot would pick up the soda can.
这种方法的优点是无需设定接下来会发生什么的内部预期。这意味着控制系统既可以 (1) 在偶然情况出现时自然地随机应变,又可以 (2) 轻松应对变化的情况——例如,在碰撞过程中有其他物体靠近。
The advantage of this approach was that there was no need to set up internal expectations for what was going to happen next. That meant that the control system could both (1) be naturally opportunistic if fortuitous circumstances presented themselves, and (2) easily respond to changed circumstances—such as some other object approaching on a collision course.
举个例子,说明手臂行为如何相互关联,让我们来想象一下抓握汽水罐的场景。手部具有抓握反射,只要有东西挡住手指之间的红外光束,手部就会做出抓握反射。当手臂通过其本地传感器找到汽水罐时,它只需驱动手臂,让两根手指排列在罐子的两侧即可。然后,手臂独立地抓住罐子。有了这种安排,人类就可以将汽水罐递给机器人。一旦抓住,手臂就会缩回——它确实这样做了。不管是故意抓取的汽水罐,还是神奇出现的汽水罐,行为中同样的机会主义让手臂能够自动适应各种杂乱的桌面,并仍然成功找到汽水罐。
As one example of how the arm behaviors cascaded upon one another, consider actually grasping a soda can. The hand had a grasp reflex that operated whenever something broke an infrared beam between the fingers. When the arm located a soda can with its local sensors, it simply drove the hand so that the two fingers lined up on either side of the can. The hand then independently grasped the can. Given this arrangement, it was possible for a human to hand a soda can to the robot. As soon as it was grasped, the arm retracted—it did not matter whether it was a soda can that was intentionally grasped, or one that magically appeared. The same opportunism among behaviors let the arm adapt automatically to a wide variety of cluttered desktops, and still successfully find the soda can.
赫伯特的观点有两点:
The point of Herbert is two-fold:
包容架构及其简单机器网络至少在表面上让人联想到许多实现智能的机械方法,例如联结主义和神经网络。但它在许多方面与这些努力不同,也与达特茅斯1之后人工智能领域的许多其他传统截然不同。我们将在以下段落中简要解释这些差异。
The subsumption architecture with its network of simple machines is reminiscent, at the surface level at least, of a number of mechanistic approaches to intelligence, such as connectionism and neural networks. But it is different in many respects from these endeavors, and also quite different from many other post-Dartmouth1 traditions in artificial intelligence. We very briefly explain those differences in the following paragraphs.
联结主义者试图构建由简单处理器组成的网络。在这方面,他们构建的东西(仅在模拟中——无论多么简单,联结主义者系统都无法在真实环境中驱动真正的机器人)与我们构建的包容网络类似。然而,它们的处理节点往往是统一的,它们从学习如何最好地连接它们(通常至少被认为是丰富的)中寻求洞察力(正如它们的名字所暗示的那样)。相比之下,我们的节点都是唯一的有限状态机,它们之间的连接密度要低得多,一点也不统一,而且层与层之间的连接密度尤其低。此外,联结主义者似乎在寻找从他们的网络中自发产生的显式分布式表示。我们不抱这样的希望,因为我们认为表示不是必需的,只出现在观察者的眼睛或脑海中。
Connectionists try to make networks of simple processors. In that regard, the things they build (in simulation only—no connectionist system has ever driven a real robot in a real environment, no matter how simple) are similar to the subsumption networks we build. However, their processing nodes tend to be uniform, and they seek insights (as their name suggests) from learning how best to interconnect them (which is usually assumed to mean richly, at least). Our nodes, by contrast, are all unique finite state machines, the density of connections among them is much lower, is not at all uniform, and is especially low between layers. Additionally, connectionists seem to be looking for explicit distributed representations to arise spontaneously from their networks. We harbor no such hopes because we believe representations are not necessary and appear only in the eye or mind of the observer.
神经网络研究是其母学科,联结主义是其近期的产物。神经网络研究人员声称,作为神经元的模型,他们的网络节点具有一定的生物学意义。鉴于建模连接的数量相对于真实神经元中的数千个连接而言很少,大多数模型似乎都难以置信。我们声称选择有限状态机作为网络节点没有任何生物学意义。
Neural-network research is the parent discipline, of which connectionism is a recent incarnation. Workers in neural networks claim that there is some biological significance to their network nodes, as models of neurons. Most of the models seem wildly implausible given the paucity of modeled connections relative to the thousands found in real neurons. We claim no biological significance in our choice of finite state machines as network nodes.
我们架构中每个单独的活动生成层都可以看作是生产规则的实现。当环境中的正确条件得到满足时,就会执行某种操作。我们认为这种类比有点像说任何带有 IF 语句的 FORTRAN 程序都在实现生产规则系统。但生产系统实际上不止于此 - 它有一个规则库,通过将部分或全部规则的先决条件与给定数据库进行匹配,从中选择特定规则;这些先决条件可能包括必须与该数据库中的个体绑定的变量。另一方面,我们的层并行运行,没有变量或匹配的需要。相反,世界的方方面面被提取出来并直接触发或修改层的某些行为。
Each individual activity-producing layer of our architecture could be viewed as in implementation of a production rule. When the right conditions are met in the environment, a certain action will be performed. We feel that analogy is a little like saying that any FORTRAN program with IF statements is implementing a production-rule system. But a production system really is more than that—it has a rule base, from which a particular rule is selected by matching the preconditions for some or all of the rules to a given database; and these preconditions may include variables which must be bound to individuals in that database. Our layers, on the other hand, run in parallel and have no variables or need for matching. Instead, aspects of the world are extracted and directly trigger or modify certain behaviors of the layer.
如果真的愿意,可以将我们的网络比作黑板控制架构。一些有限状态机将是本地知识源。其他将是通过在黑板上找到这些知识源来对这些知识源采取行动的过程。然而,我们的架构有一个简化点:所有进程都知道在“黑板”上看哪里,因为它们被硬连线到正确的位置。我认为这种强迫类比表明了它自身的弱点。进程在收集适当知识的位置方面根本没有灵活性。最先进的黑板架构大量利用了几乎所有知识的普遍共享和可用性。此外,至少在精神上,黑板系统倾向于向知识消费者隐藏特定的生产者是谁。这是黑板系统中抽象的主要手段。在我们的系统中,我们使这种联系明确而永久。
If one really wanted, one could make an analogy of our networks to a blackboard control architecture. Some of the finite state machines would be localized knowledge sources. Others would be processes acting on these knowledge sources by finding them on the blackboard. There is a simplifying point in our architecture however: all the processes know exactly where to look on the “blackboard”, since they are hardwired to the correct place. I think this forced analogy indicates its own weakness. There is no flexibility at all in where a process can gather appropriate knowledge. Most advanced blackboard architectures make heavy use of the general sharing and availability of almost all knowledge. Furthermore, in spirit at least, blackboard systems tend to hide from a consumer of knowledge who the particular producer was. This is the primary means of abstraction in blackboard systems. In our system we make such connections explicit and permanent.
在某些圈子里,人们非常相信海德格尔,认为他理解了存在的动态。我们的方法与受这位德国哲学家启发的著作(例如,阿格雷和查普曼,1987 年)有某些相似之处,但我们的工作并没有那么有启发性。它纯粹基于工程考虑。然而,这并不妨碍它在哲学辩论中被用作任何一方的一个例子。
In some circles, much credence is given to Heidegger as one who understood the dynamics of existence. Our approach has certain similarities to work inspired by this German philosopher (for instance, Agre and Chapman, 1987) but our work was not so inspired. It is based purely on engineering considerations. That does not preclude it from being used in philosophical debate as an example on any side of any fence, however.
情境性、体现性、智能和涌现性可被看作是导致我们称之为“基于行为的机器人”的新型人工智能研究的关键思想。
Situatedness, embodiment, intelligence, and emergence can be identified as key ideas that have led to the new style of artificial intelligence research that we are calling “behavior-based robots”.
传统人工智能采用一种研究方式,即为测试智能理论而构建的代理本质上是工作在符号抽象领域的问题解决者。符号可能在系统构建者的脑海中具有指称,但没有任何东西可以将这些指称置于任何现实世界中。此外,代理根本不存在于世界中。相反,它们只是被赋予一个问题,然后它们解决了它。然后它们又被赋予另一个问题,然后它们解决了那个问题。它们根本不像通常意义上的代理那样参与世界。
Traditional artificial intelligence has adopted a style of research where the agents that are built to test theories about intelligence are essentially problem solvers that work in a symbolic abstracted domain. The symbols may have referents in the minds of the builders of the systems, but there is nothing to ground those referents in any real world. Furthermore, the agents are not situated in a world at all. Rather, they are simply given a problem, and they solve it. Then they are given another problem, and they solve that one. They are not participating in a world at all, as do agents in the usual sense.
在这些系统中,没有外部世界本身,没有连续性、意外或历史。程序只处理具有自身内置物理学的模型世界。代理的知识与其应该在其中运行的世界之间存在模糊。事实上,在许多人工智能系统中,这两者之间没有区别:代理能够直接和完美地感知,也可以直接和完美地采取行动。当考虑将这些代理或系统移植到世界中运行时,就会出现一个问题:它们需要什么样的现实世界表征。多年来,在传统人工智能领域,人们已经接受了这样一个观点:它们需要一个客观的世界模型,其中包含个体实体,并随着时间的推移进行跟踪和识别。已经开发的知识表征模型期望并要求世界与代理对它的表征之间存在这种一一对应的关系。
In these systems, there is no external world per se, with continuity, surprises, or history. The programs deal only with a model world, with its own built-in physics. There is a blurring between the knowledge of the agent and the world it is supposed to be operating in. Indeed, in many artificial intelligence systems, there is no distinction between the two: the agent is capable of direct and perfect perception as well as direct and perfect action. When consideration is given to porting such agents or systems to operate in the world, the question arises of what sort of representation they need of the real world. Over the years within traditional artificial intelligence, it has become accepted that they will need an objective model of the world with individuated entities, tracked and identified over time. The models of knowledge representation that have been developed expect and require such a one-to-one correspondence between the world and the agent’s representation of it.
早期的人工智能机器人,如 Shakey 和 Cart,当然都遵循了这种方法。它们构建了世界模型,规划了绕过障碍物的路径,并在移动过程中更新了对物体相对于自身的位置的估计。我们开发了一种不同的方法(Brooks,1986),其中移动机器人使用世界本身作为自己的模型——不断参考其传感器而不是内部世界模型。对象类别和身份的问题消失了。感知处理变得简单得多。而且这个机器人(Allen)在类似任务中的表现比 Cart 更好。(Allen 执行的任务,更不用说 Herbert 了,与 Shakey 尝试的任务属于不同的类别——Shakey 当然不可能做 Allen 所做的事。)
Early AI robots, such as Shakey and the Cart, certainly followed this approach. They built models of the world, planned paths around obstacles, and updated their estimates of where the objects were relative to themselves as they moved. We have developed a different approach (Brooks, 1986) in which a mobile robot uses the world itself as its own model—continuously referring to its sensors rather than to an internal world model. The problems of object class and identity disappear. The perceptual processing becomes much simpler. And the performance of this robot (Allen) is better in comparable tasks than the Cart. (The tasks carried out by Allen, not to mention Herbert, are in a different class from those attempted by Shakey—Shakey could certainly not have done what Allen does.)
情境代理必须及时响应其输入。在这些条件下完全模拟世界在计算上具有挑战性。但它所处的世界也为代理提供了一定的连续性。这种连续性是可以依赖的,因此代理可以使用其对世界的感知而不是客观世界模型。有用的表示原语与传统人工智能中的原语相比发生了巨大变化。
A situated agent must respond in a timely fashion to its inputs. Modelling the world completely under these conditions can be computationally challenging. But a world in which it is situated also provides some continuity to the agent. That continuity can be relied upon, so that the agent can use its perception of the world instead of an objective world model. The representational primitives that are useful then change quite dramatically from those in traditional artificial intelligence.
情境性的关键思想是:世界是它自己的最佳模型。
The key idea from situatedness is: The world is its own best model.
智能系统的具身化至关重要,原因有二。首先,只有具身化的智能代理才能充分验证其能够处理现实世界。其次,只有通过物理基础,任何内部符号系统或其他系统才能找到突破点,并为系统内正在进行的处理赋予“意义”。
There are two reasons that embodiment of intelligent systems is critical. First, only an embodied intelligent agent is fully validated as one that can deal with the real world. Second, only through a physical grounding can any internal symbolic or other system find a place to bottom out, and give “meaning” to the processing going on within the system.
机器人在现实世界中的物理基础迫使其设计者处理所有问题。如果智能代理有身体、有传感器、有执行器,那么就必须面对现实世界中的所有细节和问题。在会议论文中,再也不可能争辩说模拟感知系统是现实的,或者行动中的不确定性问题并不重要。相反,物理实验可以简单而重复地进行。没有“作弊”的余地(自我欺骗的意义上)。当这样做时,通常会发现许多过去似乎很重要的问题在物理系统中并不那么重要。通常,“类似谜题”的情况,其中符号推理似乎是必要的,往往不会出现在具身系统中。同时,许多看似不是问题的问题成为主要障碍。通常,这些问题涉及感知和行动的各个方面。 (事实上,即使在这里也存在作弊的空间:例如,可以为机器人特别简化物理环境——在某些情况下,识别这种自我欺骗可能非常困难。)
The physical grounding of a robot within the world forces its designer to deal with all the issues. If the intelligent agent has a body, has sensors, and has actuators, then all the details and issues of being in the world must be faced. It is no longer possible to argue in conference papers that the simulated perceptual system is realistic, or that problems of uncertainty in action will not be significant. Instead, physical experiments can be done simply and repeatedly. There is no room for “cheating” (in the sense of self-delusion). When this is done, it is usual to find that many of the problems that used to seem significant are not so in the physical system. Typically, “puzzle-like” situations, where symbolic reasoning had seemed necessary, tend not to arise in embodied systems. At the same time, many issues that had seemed like nonproblems become major hurdles. Typically, these concern aspects of perception and action. (In fact, there is some room for cheating even here: for instance, the physical environment can be specially simplified for the robot—and it can be very hard in some cases to identify such self-delusions.)
如果主体不持续参与和感知世界,那么它就没有意义——一切都只是空洞的符号,只指向其他符号。有人可能会说,在某种抽象层面上,即使是人类思维也以这种唯我论的立场运作。然而,生物学证据表明,人类思维与世界的联系如此紧密,如此多面,以至于这些哲学抽象可能并不正确。
Without an ongoing participation in and perception of the world, there is no meaning for an agent—everything is empty symbols referring only to other symbols. Arguments might be made that, at some level of abstraction, even the human mind operates in this solipsist position. However, biological evidence suggests that the human mind’s connection to the world is so strong, and so many-faceted, that these philosophical abstractions may not be correct.
从体现出发的核心思想是:世界为意义赋予的回归奠定基础。
The key idea from embodiment is: The world grounds the regress of meaning-giving.
之前,我曾指出,我们通常认为的人类智力表现活动只在人类进化史中很小的一部分时间里出现过。我进一步指出,在动态环境中感知和移动等“简单”能力的完善需要更长的进化时间,而所有这些能力都是“高级”智力的必要基础。
Earlier, I argued that the sorts of activities we usually think of as demonstrating intelligence in humans have been taking place for only a very small fraction of our evolutionary lineage. I argued further that the “simple” things concerning perception and mobility in a dynamic environment took evolution much longer to perfect, and that all those capabilities are a necessary basis for “higher-level” intellect.
因此,我建议将更简单的动物视为构建智能的自下而上的模型。当“推理”不再是机器人智力的主要组成部分时,很快就会发现,机器人与其环境的互动动态是其智能结构的主要决定因素。
Therefore, I proposed looking at simpler animals as a bottom-up model for building intelligence. It is soon apparent, when “reasoning” is stripped away as the prime component of a robot’s intellect, that the dynamics of the interaction of the robot and its environment are primary determinants of the structure of its intelligence.
Simon (1969) 对蚂蚁沿海滩行走的讨论也是从类似的角度开始的。他指出,蚂蚁行为的复杂性更多地反映了其环境的复杂性,而不是其自身内部的复杂性。他推测人类也可能如此——但在两页文字内,他将人类行为的研究简化为密码算术问题领域。
Simon’s (1969) discussion of the ant walking along a beach started off in a similar vein. He pointed out that the complexity of the behavior of the ant is more a reflection of the complexity of its environment than of its own internal complexity. He speculated that the same might be true of humans—but then, within two pages of text, reduced the study of human behavior to the domain of crypt-arithmetic problems.
很难在智能和环境互动之间划一条界线。从某种意义上说,两者其实并不重要,因为所有智能系统都必须位于某个世界,才能成为成功或有用的实体。
It is hard to draw a line between what is intelligence and what is environmental interaction. In a sense, it doesn’t really matter which is which, inasmuch as all intelligent systems must be situated in some world or other if they are to be successful or useful entities.
来自智能的关键思想是:智能是由与世界互动的动态决定的。
The key idea from intelligence is: Intelligence is determined by the dynamics of interaction with the world.
在讨论人工智能程序中智能的所在位置时,明斯基 (1961) 指出,“程序中从来就没有‘心脏’”,相反,如果我们仔细观察,“我们会发现毫无意义的循环和琐碎操作序列”。很难指出某个组件是智能的根源。不存在小人。相反,智能源自系统组件的相互作用。然而,传统人工智能系统和基于行为的人工智能系统产生智能的方式截然不同。
In discussing where intelligence resides in an artificial intelligence program, Minsky (1961) points out that “there is never any ‘heart’ in a program,” but rather that, if we look, “we find senseless loops and sequences of trivial operations.” It is hard to point at a single component as the seat of intelligence. There is no homunculus. Rather, intelligence emerges from the interaction of the components of the system. The way in which it emerges, however, is quite different for traditional and for behavior-based artificial intelligence systems.
在传统的人工智能中,定义的模块是信息处理或功能模块。通常,这些模块可能包括感知模块、规划器、世界建模器、学习器等。这些组件直接参与感知、规划、建模、学习等功能。整个系统的智能行为(例如避开障碍物、站立、控制视线等)源自组件的交互。
In traditional artificial intelligence, the modules that are defined are information-processing or functional modules. Typically, these might include a perception module, a planner, a world modeler, a learner, and the like. Such components directly participate in the functions of perceiving, planning, modeling, learning, and so on. Intelligent behavior of the system as a whole—such as avoiding obstacles, standing up, controlling gaze, et cetera—emerges from the interaction of the components.
相比之下,在基于行为的人工智能中,定义的模块是行为产生模块。通常,这些模块可能包括用于避障、站立、注视控制等的模块。这些组件直接参与产生避障、站立、注视控制等行为。整个系统的智能功能(如感知、规划、建模、学习等)源自组件的交互。
In behavior-based artificial intelligence, by contrast, the modules that are defined are behavior-producing. Typically, these might include modules for obstacle avoidance, standing up, gaze control, and the like. Such components directly participate in producing the behaviors of avoiding obstacles, standing up, controlling gaze, and so on. Intelligent functionality of the system as a whole—such as perception, planning, modeling, learning, et cetera—emerges from the interaction of the components.
虽然传统系统和基于行为的系统之间的这种二元论看起来不错,但并不完全准确。传统系统几乎从未真正与世界相连,因此在大多数情况下,智能行为的出现更多的是一种期望,而不是一种既定现象。相反,由于基于行为的系统中存在许多行为,以及它们与世界互动的各自动态,因此通常很难说某一系列特定的动作是由某个行为模块产生的。有时许多行为会同时发生,或快速切换。
Although this dualism between traditional and behavior-based systems looks pretty, it is not entirely accurate. Traditional systems have hardly ever been really connected to the world, and so the emergence of intelligent behavior is, in most cases, more of an expectation than an established phenomenon. Conversely, because of the many behaviors present in a behavior-based system, and their individual dynamics of interaction with the world, it is often hard to say that a particular series of actions was produced by a particular behavior-module. Sometimes many behaviors are occurring simultaneously, or are switching rapidly.
在任何系统中确定智能的来源都是不切实际的,因为智能是由许多组件的相互作用产生的。智能只能由系统的整体行为以及该行为与环境的关系来确定。
It is not feasible to identify the seat of intelligence within any system, since intelligence is produced by the interactions of many components. Intelligence can only be determined by the total behavior of the system and how that behavior appears in relation to the environment.
涌现的关键思想是:智力存在于观察者的眼睛里。
The key idea from emergence is: Intelligence is in the eye of the observer.
由于我们的方法是基于性能的,因此必须使用我们构建的系统的性能来衡量其有用性并指出其局限性。
Since our approach is performance based, it is the performance of the systems we build which must be used to measure its usefulness and to point to its limitations.
我们声称,我们的基于行为的机器人使用包容架构来实现完整的 Creatures,是目前现存反应最灵敏的实时移动机器人。大多数其他移动机器人仍处于静态环境中的单独“实验运行”阶段,或者最多处于完全映射的静态环境中。另一方面,我们的机器人只需轻轻一按开关,就可以在复杂的动态环境中完全自主地运行,并一直运行到电池耗尽。我们相信它们的运行水平更接近简单的昆虫级智能,而不是细菌级智能。从单细胞进化到昆虫用了 30 亿年,从那里进化到人类只用了 5 亿年。这句话并不是对我们未来表现的预测,而是表明昆虫级智能的非凡性质。
We claim that our behavior-based robots, using the subsumption architecture to implement complete Creatures, are by now the most reactive real-time mobile robots in existence. Most other mobile robots are still at the stage of individual “experimental runs” in static environments, or at best in completely mapped static environments. Ours, on the other hand, operate completely autonomously in complex dynamic environments at the flick of their on-switches, and continue until their batteries are drained. We believe they operate at a level closer to simple insect-level intelligence than to bacteria-level intelligence. Evolution took 3 billion years to get from single cells to insects, and only another 500 million years from there to humans. This statement is not intended as a prediction of our future performance, but rather to indicate the nontrivial nature of insect-level intelligence.
尽管迄今为止表现良好,但我们的方法仍存在许多严重问题。我们对这些问题将如何解决抱有信心和希望,但根据我们的标准,只有性能才是真正重要的。实验和构建更复杂的系统需要时间。因此,在此期间,我们能做的最好的事情就是指出主要问题所在,并希望从我们目前的情况来看,至少有一条可行的道路可以通往更智能的机器。
Despite this good performance to date, there are a number of serious questions about our approach. We have beliefs and hopes about how these questions will be resolved, but under our criteria only performance truly counts. Experiments and building more complex systems take time. So, in the interim, the best we can do is indicate where the main questions lie, with the hope that there is at least a plausible path forward to more intelligent machines from our current situation.
我们相信,我们正在开发的那些产生活动的控制层(移动性、视觉和生存相关任务)是我们赋予人类的更高水平智能的必要先决条件。关于我们方法的局限性的最自然和最严重的问题是:
Our belief is that the sorts of activity-producing layers of control we are developing (mobility, vision, and survival related tasks) are necessary prerequisites for higher-level intelligence in the style we attribute to human beings. The most natural and serious questions concerning limits of our approach are:
只有在现实世界中对真实生物进行实验才能解答对我们的方法的自然怀疑。时间会告诉我们答案。
Only experiments with real Creatures in real worlds can answer the natural doubts about our approach. Time will tell.
芭芭拉·韦伯
Barbara Webb
2023
2023
仿生机器人方法是人工智能的一个可识别分支,具有两个显著特点。首先,仿生机器人着手复制(或解释)动物智能,重点研究人类与其他能够做出复杂、目标导向和适应性行为的物种在思维方面的共同点。其次,仿生机器人使用机器人模型(与现实世界互动的真实物理系统)来测试有关智能功能的假设。
The biorobotic approach is an identifiable strand of artificial intelligence that has two distinctive features. First, it sets out to reproduce (or account for) animal intelligence, focusing on those aspects of mind that humans have in common with other species capable of complex, goal-directed and adaptive behaviour. Second, it uses robotic models—real, physical systems that interact with the real world—to test hypotheses about intelligent function.
尽管从未成为主流,但一些最早的人工智能方法是仿生机器人,例如 Walter (1961)。布鲁克斯颇具影响力的论文《没有表征的智能》(Brooks,1991)被认为为一代人工智能研究人员复兴了这种方法。布鲁克斯认为,人工智能研究中通常被认为是智能的东西(语言、逻辑推理、下棋)是一种近期的、主要是文化的发展,它依赖于大多数动物共有的成功的世界互动的深厚基础。此外,后者是我们最不了解或无法复制的智能部分。布鲁克斯建议,尝试复制昆虫的智能可能是解决人工智能中一些关键问题的好方法。许多人工智能通过不要求系统在现实世界中行动来掩盖与世界互动的问题。它假设推理和规划有一个抽象的表示,而不是要求这种表示来自原始感官数据。事实上,理解在世界上行动所需要的东西通常会表明,抽象的表征对于任务来说实际上并不是必需的:穿过房间而不撞到物体可能不需要房间的真实三维地图和明确的路径规划。
Although never mainstream, some of the earliest approaches to AI were biorobotic, e.g., Walter (1961). Brooks’s influential essay “Intelligence without Representation” (Brooks, 1991) is appropriately credited with reviving the approach for a generation of AI researchers. Brooks argued that what is normally considered intelligence in AI research (language, logical reasoning, playing chess) is a recent, largely cultural, development that depends on a deep foundation of successful world-interaction common to most animals. Moreover, the latter is the part of intelligence we have least understanding of, or ability to replicate. Perhaps, Brooks suggested, trying to reproduce the intelligence of insects would be a good way to approach some key problems in AI. Much of AI glosses over the problem of interaction with the world by not requiring the systems to act in the real world. It assumes an abstracted representation for reasoning and planning rather than requiring this representation to be derived from raw sensory data. Indeed, understanding what is needed for acting in the world often reveals that an abstracted representation is not actually needed for the task: getting across the room without running into objects may not require a veridical three-dimensional map of the room and explicit path planning.
在本章中,我想反思将仿生机器人方法作为人工智能的一项严肃研究议程的结果;更具体地说,研究如何构建昆虫智能的机器人模型可能会为心智设计提供新的见解。从人工智能的角度来看,其基本动机保持不变:尽管在许多方面取得了进展,但我们仍然在很大程度上无法复制动物智能,这种智能可以让物理机器人在现实和变化的条件下完成一项有趣的任务。蚂蚁和蜜蜂的中心觅食就是许多动物成功完成但超出机器人技术水平的任务的一个例子。单个昆虫能够在未知地形上长途跋涉,从大量干扰物体中识别稀缺资源物品,操纵它们进行运输,并返回旅程的起点,尽管它们已经远远超出了与蜂巢或巢穴的任何直接感官接触范围。这种行为能力水平似乎强烈地激发了这样一种假设,即表征过程正在动物的大脑中发生——即使不是以完整的“认知地图”的形式(Webb,2019),至少也能够内部追踪巢穴的距离和方向(Webb,2006)。
In this chapter, I want to reflect on the outcome of taking the biorobotic approach to AI as a serious research agenda; and more specifically examine how building robot models of insect intelligence might provide novel insight into Mind Design. The essential motivation from an AI perspective remains the same: despite advances on many fronts, we are still largely unable to replicate the kind of animal intelligence that would allow a physical robot to achieve an interesting task under realistic and varying conditions. An example of the kind of task performed successfully by many animals but beyond the state of the art in robotics is central-place foraging as seen in ants and bees. Individual insects are able to traverse long distances over unknown terrain, identify scarce resource items from amongst a huge array of distractor objects, manipulate them for transport, and return to the origin of their journey despite having moved far beyond the range of any direct sensory contact with the hive or nest. This level of behavioural competence seems to strongly motivate the assumption that representational processes are taking place in the animal’s brain—even if not in the form of a full “cognitive map” (Webb, 2019) then at least in the ability to track internally the distance and direction of the nest (Webb, 2006).
尽管昆虫导航之前曾被争论为智能如何需要表征的典型(例如 Gallistel,2003),但它并没有像啮齿动物导航系统那样受到广泛讨论(例如 Bechtel,2014)。但它有一个特殊的、最近开发的优势:我们现在拥有一些昆虫导航关键神经机制的详细且完全扎实的模型。尽管在啮齿动物中有许多令人着迷的神经科学发现(尤其是从蝙蝠身上获得的有趣的比较见解(Geva-Sagiv 等人,2015)),但哺乳动物大脑中的位置和网格细胞系统对行为的实际功能相关性基本上是未知的。事实上,这些系统中的活动甚至可能是某些更基本过程的附带现象。这使得哲学家们可以争辩说,啮齿动物导航是通过符号、分布式表征、预测模型或(非表征)动态系统来运作的(例如 Segundo-Ortin 和 Hutto,2019)。相比之下,对于觅食昆虫,我们可以在单个已识别神经元的水平上描述将感觉输入与运动命令连接起来的完整通路,这些命令使它能够导航回巢穴,并在机器人上实现支持此功能的电路。这似乎提供了一个独特的机会来解决一些有关生物智能表征性质的基本问题。当我们研究这项认知复杂性极高的任务背后的实际大脑机制时,我们是否看到了“没有表征的智能”?
Although insect navigation has been previously debated as an exemplar of how intelligence requires representation (e.g. Gallistel, 2003), it has not been so widely discussed as the rodent navigation system (e.g. Bechtel, 2014). But it has a particular, recently developed advantage: we now have detailed and thoroughly grounded models of some of the key neural mechanisms of insect navigation. Despite many fascinating neuroscientific findings in rodents (and particularly interesting comparative insights from bats (Geva-Sagiv et al., 2015)), the actual, functional relevance for behaviour of the place-and-grid-cell systems in the mammalian brain are essentially unknown. Indeed, activity in these systems could even be epiphenomena of some more fundamental process. This leaves philosophers open to argue that rodent navigation operates with symbols, distributed representations, predictive models, or as a (non-representational) dynamical system (e.g. Segundo-Ortin and Hutto, 2019). By contrast, for the foraging insect we can describe, at the level of single identified neurons, the full pathway linking sensory input to motor commands that enable it to navigate back to its nest and demonstrate a circuit that supports this function when implemented on a robot. This would seem to provide a unique opportunity to resolve some of the fundamental questions around the representational nature of biological intelligence. When we look at the actual brain mechanisms underlying this task of significant cognitive complexity, do we see “Intelligence without Representation”?
此时,我应该概述一下我认为在使用机器人作为动物智能的比较点时存在问题的“表征”的含义。显然,无论是机器人还是动物,它们行为灵活性的一个重要因素是,各种各样的物理信号被转换成支持复杂内部处理的通用电子货币(例如,具有任意输入输出功能、延迟、内存、门控等)。我不认为这本身就构成了“表征”,而是将其称为“计算”(包括模拟计算),以表示与固有地与一个物理基础相关的处理类型(也是机器人或动物在世界上表现所必需的)的区别,例如感觉传导、运动力的产生或消化/功耗。在传统的机器人技术中,这种计算处理通常(但不一定)采用表征形式:重建外部世界相关属性的内部模型;并使用该模型上的操作来确定要执行的合适操作。检查机器人代码(虽然可能不是硅操作中的实例)的人可能会指出代表相关世界属性并按照相关世界属性进行逻辑操作的特定变量。存在争议的经验问题是,这是否是解释动物大脑中发生的至少部分处理的正确方法或最有效方法。
I should outline, at this point, the sense of “representation” that I take to be at issue when using robots as a comparison point for animal intelligence. It is clear that, in both robots and animals, an important contributor to the flexibility of behaviour of which they are capable is the fact that a wide array of physical signals are transformed into a common electronic currency that supports sophisticated internal processing (e.g., with arbitrary input-output functions, delays, memory, gating, etc.). I do not take this in itself to constitute “representation” but will call it “computation” (inclusive of analog computation), in order to denote the difference from the type of processing (also necessary for a robot or an animal to behave in the world) that is inherently tied to one physical basis, such as sensory transduction, production of locomotor forces, or digestion/power consumption. In conventional robotics, such computational processing typically (but not necessarily) takes a representational form: reconstructing an internal model of relevant properties of the external world; and using operations on this model to determine suitable actions to be executed. Someone examining the robot’s code (although possibly not its instantiation in silicon operations) could point to particular variables that stand in for, and are operated on in logical accordance with, relevant world properties. The empirical question at issue is whether this is the correct, or most productive, way to interpret at least some of the brain processing that occurs in animals.
有些人可能会认为,在机器人的情况下,其变量的“替代”属性取决于人类程序员的意图,而不是从机器人的角度来看固有的“表征”(Searle,1980)。我认为这是站得住脚的立场,这也破坏了将任何神经过程解释为“代表”动物世界的说法。然而,这绝不是刚刚提出的经验问题的解决,我认为这是一个对认知神经科学和人工智能更具实际意义的问题。更具体地说,我想研究的问题是理解自然大脑的方法论方法的有效性,该方法以两种互补的方式将“表征”视为关键概念。一种是自上而下的:要理解智能行为,我们应该考虑需要什么样的表征格式和过程(如果有的话)才能从输入中得出适当的动作,然后(如果我们关心实现)在大脑过程中寻找这些。另一种是自下而上的:通过测量神经元或大脑区域对外部线索的反应,我们将了解这些反应代表什么,以及它们是如何被处理来控制行为的。这两种观点都受到了哲学家的质疑(例如,Brette,2019),但它们仍然是主流观点。
Some might argue that in the robot case, the “standing-in” properties of its variables are dependent on the intentionality of the human programmer, and not inherently “representational” from the robot’s point of view (Searle, 1980). I think this is a defensible position, which also undermines the interpretation of any neural process as “representing” the world to the animal. However, this by no means settles the empirical question just posed, which I take to be the question of more practical interest to cognitive neuroscience and AI. More specifically, the issue I want to examine is the validity of a methodological approach to understanding natural brains that takes “representation” to be a key concept in two complementary ways. One is from the top-down: that to understand an intelligent behaviour, we should consider what representational format and processes (if any) are required to derive appropriate actions from inputs, and then (if we care about implementation) look for these in brain processes. The other is bottom up: by measuring the response of neurons or brain areas to external cues, we will build up an understanding of what those responses represent and how they are processed to control behaviour. Both of these have been challenged by philosophers (e.g., Brette, 2019), but they remain the prevailing view.
请注意,仿生机器人方法本身对这个问题持不可知论态度。仿生机器人寻求发现和实现(作为机器)动物智能的机制,无论这些机制是否采用表征形式。在以下章节中,我的目标是通过解释我们通过采用仿生机器人方法揭示昆虫导航的神经回路所学到的知识,使“没有表征的智能”问题更加具体。我将首先描述一个特定的导航问题和一个用于解决该问题的人类设计的(表征)工具的示例。然后,我将描述似乎构成昆虫相同表现基础的神经回路,它们具有一些惊人的对应关系。然后,我将反思表征思维在揭示和理解该回路中的作用。我得出的结论是,表征视角具有重要的解释作用,但当该视角应用于神经认知环境而没有适当考虑相关回路支持的实际、具体、行为任务时,就会出现问题。换句话说,仿生机器人方法固有的行为和机械嵌入有助于发现所代表的内容。
Note here that the biorobotic approach per se is agnostic on this issue. Biorobotics seeks to discover and implement (as machines) the mechanisms of animal intelligence, whether or not these take a representational form. My aim in the following sections is to make the issue of “Intelligence without Representation” more concrete by explaining what we have learnt by taking a biorobotic approach to uncovering the neural circuits for insect navigation. I will first describe a specific navigational problem and an example of a human-designed (representational) tool used to solve it. I will then describe the neural circuit that appears to underlie the same performance in the insect, which has some striking correspondences. I will then reflect on the role of representational thinking in uncovering and understanding this circuit. I conclude that representational perspectives have an important explanatory role, but problems arise when that perspective is applied in neurocognitive contexts without appropriate consideration of the actual, embodied, behavioural task that the relevant circuit supports. In other words the behavioural and mechanistic embedding that is inherent in the biorobotic approach facilitates discovering what is represented.
在整个人类文明中,追踪相对于起点和/或目的地的当前位置一直很重要,而在看不到地标的旅途中尤其困难。在跨洋探险时代,这一点变得至关重要。在卫星导航出现之前,船上使用的一种关键方法是所谓的“航位推算”,即追踪旅途中每一段航程的方向和距离,并将它们结合起来估计相对于起点的当前位置。一种称为“航位板”的工具可帮助完成这一过程(May 和 Holder,1973 年),它旨在让水手在标准的四小时值班期间(图 24.1)轻松收集有关船舶航行的相关信息。它由一块木板组成,木板上有呈放射状排列的钉孔,代表每个罗盘方向,下方还有成排的孔,代表速度。每值班半小时,就会放置一对相连的钉子来标记罗盘方位和航行速度(以节为单位)。随后,航海员可使用这八个矢量来计算海图上的新位置,然后重置航海图以便进行下一次值班。
Tracking your current location relative to a starting point and/or a goal destination has been important throughout human civilization and is particularly difficult when undertaking journeys out of sight of landmarks. It became crucial in the age of ocean-crossing exploration. Before the advent of satellite navigation, a key method used on ships was so-called “dead reckoning,” that is, keeping track of the direction and distance travelled on each leg of the journey and combining them to estimate the current location relative to the starting point. An aid to this process is a tool known as a “traverse board” (May and Holder, 1973), designed as a simple way for sailors to collect the relevant information on the ship’s travel during the standard watch of four hours (figure 24.1). It consists of a wooden board with peg holes arranged in radiating lines to represent each compass direction, and below, further holes in rows to represent speed. For each half-hour of the watch, a linked pair of pegs would be placed to mark the compass bearing and speed of sailing (measured in knots). Afterwards, a navigator could use the eight vectors thus represented to calculate the new position on a chart, and the board could then be reset for the next watch.
图 24.1
横移板导航原理。A:实际使用:每对连接的桩子记录旅程中一段航程的罗盘航向(上方)和行进速度(下方)。这可用于通过对矢量求和来估计相对于起点的位置。B:替代使用:每个桩子在每段航程上向外移动,与沿该方向行进的速度成比例。桩子模式是矢量和的分布式表示。
Figure 24.1
The principle of traverse board navigation. A: Actual usage: each connected pair of pegs records the compass heading (above) and speed of travel (below) for one leg of the journey. This can be used to estimate the position with respect to the starting point by summing the vectors. B: Alternative usage: each peg is moved outwards on each leg proportionally to the speed travelled in that direction. The peg pattern is a distributed representation of the vector sum.
横板是一种表征装置,这一点似乎不言而喻。钉子的位置代表了过去四个小时内的实际旅程。这追踪了世界的远端属性——船相对于其出发点的位置——这是无法从直接近端信号中恢复的。横板的存在就是为了执行此功能,设计为针对此功能具有鲁棒性,但仍然可能歪曲世界的真实状态。它有一个生产者(水手)和一个消费者(导航员)。它是一种“信息承载结构”,可以在各种导航功能中发挥作用:例如,它可以投射到地图上,预测登陆,或用于设定回家的路线。如果想象一个记忆力好的水手将横板内化——在值班期间在脑海中跟踪八个罗盘点和速度,甚至可能在脑海中做一个粗略的加法来估计他们的进度——这似乎是心理表征服务于智能行动的一个典型例子。
It seems self-evident that the traverse board is a representational device. The positions of the pegs stand in for the actual journey made in the last four hours. This tracks a distal property of the world—the ship’s location relative to its starting point—which is not recoverable from an immediate proximal signal. The board exists to perform this function, is designed to be robust with respect to this function, but can nevertheless misrepresent the true state of the world. It has a producer (the sailor) and a consumer (the navigator). It is an “information bearing structure” that can play a role in various navigational functions: for example, it can be projected into a map, predict landfall, or be used to set a course for home. And if one imagined a sailor with a good memory internalising the board—keeping mental track of the eight compass points and speeds during their watch, and maybe even doing a rough addition in their head to estimate their progress—it would seem a cardinal example of mental representations in the service of intelligent action.
为了帮助理解下文,我将描述一些关于如何使用物理横移板的简单变体(尽管我还没有发现任何实际使用记录,说明它是如何以我将要描述的方式使用的)。在常规使用中,速度和方向编码是分开的,但将它们结合起来的一个明显方法是选择代表当前航向的径向线,并从中心向外计数,将钉子放在代表速度的点上。如果在值班的后期出现了相同的方向,那么钉子可以移得更远——有效地进行一些在线处理,以求得最终的和。一个优点是,从板上一眼就能相对容易地看到船的大致总进度。
To help provide insight into what follows, I am going to describe some simple variants on how the physical traverse board could have been used (although I have not discovered any actual account of it being used in the way I am about to describe). In conventional use, the speed and direction encoding are separated, but an obvious way to combine them would be to choose the radial line of pegs representing the current heading and count outwards from the centre to place the peg at a point representing the speed. If the same direction occurred during a later segment of the watch, the peg could just be moved farther out—effectively doing some online processing toward the final summation. One advantage would be that it would become relatively easy to see, from a glance at the board, what the approximate total progress of the ship has been.
事实上,可以将横移板转换成一台物理计算机,用于计算航程总和,并进行以下增强。不是只移动一个代表当前罗盘航向的钉子,而是可以按比例移动所有钉子,以将当前矢量投影到它们所代表的方向轴上。从数学上讲,正确的比例是通过取罗盘航向与相关轴之差的余弦值得出的,但为了方便水手,可以提前提供一个简单的指南,显示每个钉子应移动的相对量(针对每个速度等级)。经过八个半小时的轮班后,可以直接从板上读取矢量和,因为方向将对应于最外面的钉子,距离对应于最里面和最外面钉子之间的差异(它们应该彼此相对)。由于钉孔施加的粗略编码,这种计算只是近似的,但它至少与原始方法一样准确。对于几何学上更精明的人来说,这似乎是显而易见的:事实上,两个轴就足够了,它们将构成矢量的笛卡尔编码,相当于代数求和以获得矢量和。如果我们真的将该方法转化为实际的计算机,这将是解决问题的有效方法。另一方面,使用冗余矢量使水手们一眼就能看到信息。
In fact, it would be possible to convert the traverse board into a physical computer for the sum of the legs of travel, with the following augmentation. Instead of just moving one peg, representing the current compass heading, all the pegs could be moved proportionally to the projection of the current vector onto the directional axis they represent. Mathematically, the correct proportion is given by taking the cosine of the difference in the compass heading from the axis in question, but for the sailors’ convenience, a simple guide showing the relative amounts (for each gradation in speed) by which each peg should be moved could be provided in advance. After the eight half-hour watches, the vector sum can be read directly from the board, as the direction will correspond to the outermost peg, and the distance to the difference between the innermost and outermost peg (which should fall opposite to one another). This calculation will only be approximate, due to the coarse encoding imposed by the pegholes, but it will be at least as accurate as the original method. It might seem obvious to the more geometrically astute that in fact, two axes would be sufficient and would constitute a cartesian encoding of the vectors, and the equivalent of algebraic summation to get the vector sum. If we were indeed to translate the method to an actual computer, this would be the efficient way to solve the problem. On the other hand, the use of redundant vectors makes the information more easily perceptible at a glance for the sailors.
还要注意,如果水手们追踪自己位置的目的仅仅是为了能够返回出发点(例如,在钓鱼旅行后返回港口),他们可以直接使用横移板信息,而无需明确提取矢量和。在原始版本中,一个简单的策略是沿着已记录的八条航程中的每一条航程航行回来——这不需要按顺序进行,因为总和会相同。但使用增强版本,这次回家的旅程会更有效率——他们应该只是朝着与最外面的桩相反的方向移动,这应该是回家的直线方向。事实上,更简单的是,他们应该尝试保持这个罗盘方位,也就是说,如果罗盘指示他们在板上指示的方向的左侧或右侧,则向右或向左调整他们的航向。如果他们在回家的整个旅程中继续更新板,那么在每个指示剩余行驶距离的点上,每个半径上的桩之间的差异将逐渐减小。当所有桩子再次水平时,这表明应该已经到达原点(所有航程的总和为零)。请注意,如果诸如盛行风等不利条件迫使航向偏离预期的返航方向,这种方法(不断更新棋盘以始终表示相对于原点的当前矢量和)也将自动纠正航向。
Note also that if the sailors’ aim in tracking their position was solely to be able to return to their point of origin (e.g., returning to port after a fishing excursion), they could potentially use the traverse board information directly, without explicitly extracting the vector sum. In the original version, a simple strategy would be to sail back along each of the eight legs that has been recorded—this would not need to be in order, as the sum would come out the same. But using the augmented version, this homeward journey would be more efficient—they should simply move in the direction opposite to the outermost peg, which should be the straight direction home. Indeed, even more simply, they should try and maintain this compass bearing, that is, adjust their course to the right or left if the compass indicates they are left or right of the direction indicated on the board. If they continue to update the board throughout the homeward journey, then gradually the difference between the pegs on each radii will be reduced, at each point indicating the remaining distance to travel. When all the pegs are level again, this signals that the origin should have been reached (all legs taken sum to zero). Note that this method (constantly updating the board to always represent the current vector sum relative to the origin) will also automatically correct the course if adverse conditions such as the prevailing wind force a deviation from the desired homeward direction.
横板的使用,尤其是作为控制导航返回起点的直接指南,可以被视为“扩展认知”的一个很好的例子(Clark 和 Chalmers,1998 年)。通过将记忆负担置于头脑之外,并且通过使用指南针等外部辅助工具来获取全局方向信息,它显著增强了人类追踪其在空间中位置的认知能力。它也可以作为分布式表示的一个例子来讨论(Gelder,1992 年)——没有一个钉子位置“代表”到原点的距离或方向,但它们的集体位置以一种可以有效用于回家的方式编码了这些信息,即使没有明确解码信息。然而,我如此详细地讨论横板的原因并不是为了说明扩展或分布式认知,而是因为它与支持相同行为的昆虫大脑机制提供了惊人的相似性。
The use of the traverse board, particularly as an immediate guide to control navigation back to the starting point, might be considered a nice example of “extended cognition” (Clark and Chalmers, 1998). By placing the burden of memory outside the head, and indeed, by using external aids such as a compass to get global directional information, it significantly enhances the human’s cognitive capacity to track their location in space. It might also be discussed as an example of a distributed representation (Gelder, 1992)—no single peg position “stands in for” the distance or the direction to the origin, but their collective positions encode this information in a manner that can be used effectively to get home, even without explicitly decoding the information. However, the reason I have discussed the traverse board at such length is not to illustrate extended or distributed cognition, but because it provides a surprisingly close analogy to the mechanism in the insect’s brain that supports the same behaviour.
人们早就知道昆虫(和其他动物)能够执行相当于导航航位推算的动作,即整合沿出站路径的移动并直接返回起始位置。最常用的术语是“路径整合”(Heinze 等人,2018 年),沙漠蚂蚁(Müller 和 Wehner,1988 年)最能体现这一点。个体蚂蚁在走了一条长达数百米的曲折外出觅食路线(Huber 和 Knaden,2015 年)后,当它们发现食物时,就会直奔回家。如果它们在收集食物后立即被转移,它们将沿着一条完全平行的路径行走,并在穿越适当的距离后停下来寻找。这表明不需要地标或化学线索来引导回家的旅程,并且“回家矢量”的方向和长度都可以指导正确的行动。
It has been known for a long time that insects (and other animals) are able to perform the equivalent of navigational dead-reckoning, that is, to integrate motion on along the legs of an outbound path and take the direct path back to the starting location. The terminology most often used is “path integration” (Heinze et al., 2018), and it has been most strikingly demonstrated in desert ants (Müller and Wehner, 1988). Individual ants, after taking a convoluted outward foraging route that can be hundreds of meters in length (Huber and Knaden, 2015), will run on a bee-line home when they discover food. If displaced just after they collect the food, they will follow an exactly parallel path, and stop to search after traversing the appropriate distance. This demonstrates that no landmark or chemical cues are required to guide the homeward journey and that both the direction and length of the “home vector” guides correct action.
这种行为被广泛用于论证昆虫大脑中的内部表征(Gallistel,1990),但也经常被引用作为昆虫“认知地图”的另一种解释(Cruse 和 Wehner,2011)。事实上,蚂蚁似乎能够随时估计自己相对于家的位置,而不必形成任何持久的环境整体布局表征,就像上面描述的水手可以使用他们的横板回家,而不必在地图上标出他们的位置一样。另一方面,正确且一致地更新如何返回远远超出感官范围的家的位置的内部估计的能力,从表面上看,需要令人印象深刻的几何运算。当然,这超出了简单的感觉运动反射、固定动作模式或刺激反应关联,这些可能被认为耗尽了只有约100,000个神经元的大脑的能力。沙漠蚁已被证实能利用天体信息(太阳和天空中的偏振模式)来检测其罗盘方向(Wehner,2008),并利用步数和光流来估计行进距离(Wolf 等人,2018)。它还可以利用从食物地点获取的回家矢量的记忆,随后返回该位置(Wehner 等人,2004)。蜜蜂似乎能够进行矢量加法以找到食物地点之间的捷径(Menzel 等人,2012),蜜蜂因能够通过摇摆舞将食物矢量传达给巢穴同伴而闻名(Riley 等人,2005)。因此,这似乎是一个惊人的灵活系统,但对于它是否构成服务于认知的内部表征的例子,尚缺乏共识。
This behaviour has been popularly used to argue for internal representations in the insect brain (Gallistel, 1990) yet is also often cited as an alternative explanation to a “cognitive map” for the insect (Cruse and Wehner, 2011). Indeed, it seems possible that the ant might be able to keep an estimate of its location, at any moment, relative to home without necessarily forming any enduring representation of the overall layout of its environment, just as the sailors described above could potentially get home using their traverse board without ever plotting their position on a map. On the other hand, the ability to correctly and consistently update an internal estimate of how to return to a home location that has gone far out of sensory range requires, on the face of it, impressive geometrical operations. Certainly, this exceeds the simple sensorimotor reflexes, fixed action patterns, or stimulus-response associations that might be thought to exhaust the capabilities of a brain that has only ∼ 100,000 neurons. The desert ant has been shown to use celestial information (the sun and polarisation pattern in the sky) to detect its compass direction (Wehner, 2008) and both step-counting and optic flow to estimate distance travelled (Wolf et al., 2018). It can also use memory of the home vector taken from a food location to subsequently return to that location (Wehner et al., 2004). Bees appear able to do vector addition to find shortcuts between food locations (Menzel et al., 2012), and honeybees are famous for being able to communicate a food vector to a nestmate through the waggle dance (Riley et al., 2005). As such, this seems an astonishingly flexible system, yet consensus on whether it constitutes an example of internal representations in the service of cognition is lacking.
造成这种僵局的原因之一是,尽管对这种行为进行了数十年的研究,并建立了许多算法模型,但支持昆虫大脑中路径整合的实际机制仍不清楚。然而,我们最近对神经回路有了独特的见解,这种神经回路几乎肯定负责昆虫大脑中发生的矢量操作,既可以保持准确的回家矢量估计,也可以用它来引导回家(Stone 等人,2017 年;Stankiewicz 和 Webb,2020 年)。
One reason for this impasse has been that, despite decades of research into the behaviour, and many algorithmic models, the actual mechanisms supporting path integration in the insect brain were unknown. However, we have recently gained unique insight into the neural circuitry that is almost certainly responsible for the vector operations that occur in the insect brain, both to maintain an accurate home vector estimate and to use it to steer home (Stone et al., 2017; Stankiewicz and Webb, 2020).
正如预期的那样,该电路的本质是我上面描述的“增强型横穿板”的实现。昆虫大脑的中央复合体具有八重柱状结构,在拓扑上形成一个圆形(图 24.2)。已证明,原脑桥区域中已识别的“指南针”神经元(内环)会根据动物的前进方向做出反应,相对于包括偏振光在内的视觉线索。因此,每个柱都对应于地心(更准确地说是天体)罗盘方向。连接到所有柱的横向神经元(标记为“速度”)提供来自响应光流的神经元的输入,与昆虫的速度成比例。每列中一组已识别的神经元具有精确正确的连接,能够根据这两个输入增强其活动;也就是说,速度信号可以根据当前前进方向与该列首选的指南针调谐的匹配程度在每个神经柱中以不同的方式累积。因此,遵循与遍历板描述的相同原理,在任何时间点,这组“主矢量”神经元(中环)的活动分布模式将等同于出站路径所有腿的矢量和。
As has been anticipated, the essence of this circuit is an implementation of the “augmented traverse board” I have described above. The central complex of the insect brain has an eight-fold columnar structure that topologically forms a circle (figure 24.2). Identified “compass” neurons (inner ring) in the protocerebral bridge region have been shown to respond according to the heading direction of the animal, relative to visual cues including polarised light. Thus each column corresponds to a geocentric (more precisely, a celestial) compass direction. Transverse neurons (labelled as “speed”) connecting to all columns provide input from neurons that respond to optic flow, proportionally to the speed of the insect. A set of identified neurons in each column have precisely the right connections to be able to augment their activity based on these two inputs; that is, the speed signal can be accumulated differently in each neural column according to how closely the current heading direction matches that column’s preferred compass tuning. Thus, following just the same principle as described for the traverse board, at any point in time, the distributed pattern of activity across this set of “home vector” neurons (middle ring) will be equivalent to the vector sum of all legs of an outbound path.
图 24.2
昆虫路径整合的神经回路(Stone 等人,2017 年)。请注意,图示回路的每个元素和连接在已知的神经解剖学中都有直接对应关系。详情请参阅正文。
Figure 24.2
The neural circuit that underlies path integration in insects (Stone et al., 2017). Note that each element and connection of the illustrated circuit has a direct correspondence in known neuroanatomy. See text for details.
原则上,下游电路可以通过找到活动性最高的列(方向)并计算各列间活动性最高和最低的差值(距离)来读取原点向量。然而,在实践中,下游连接似乎实现了一个转向电路,可以直接控制动物使其返回原点。本质上,这是通过将“原点向量”神经元的输出分别向左移动一列和向右移动一列的连接来实现的。接收这些输入的“转向”神经元(外环)还受到编码动物当前航向的“指南针”神经元的直接抑制,因此它们的活动反映了当前航向与左移或右移的原点向量之间的差异。这告诉昆虫向左转或向右转会产生更好的匹配,或者换句话说,在任何时候,如何纠正其路线以沿原点向量方向行进。事实上,这些转向神经元直接连接到昆虫大脑运动前区的转向电路。
In principle, the downstream circuit could read off the home vector, by finding the column with highest activity (direction) and taking the difference between the highest and lowest activity across the columns (distance). In practice, however, the downstream connections appear to implement a steering circuit that can directly control the animal so that it moves back to the origin. In essence, this occurs through connections that shift the output of the “home vector” neurons by one column to the left and one column to the right, respectively. The “steering” neurons (outer ring) receiving these inputs also receive direct inhibition from the “compass” neurons encoding the current heading of the animal so that their activity reflects the difference of the current heading from a left- or right-shifted home vector. What this tells the insect is whether turning left or right would produce a better match, or in other words, at any moment, how to correct its course to go in the home vector direction. Indeed, these steering neurons connect directly to steering circuitry in the premotor area of the insect brain.
还有许多其他微妙之处可以优化此回路,以实现精确的路径整合功能。例如,指南针神经元之间的抑制连接似乎可以将它们之间的响应塑造成近似余弦——即矢量航向到与每列对应的轴的正确数学映射(Lyu 等人,2020 年)。光流神经元实际上对动物在正交方向上的平移有反应,这使得整个系统能够跟踪实际地面速度,即使动物没有朝着它所面对的方向移动,例如在飞行中侧滑时(Stone 等人,2017 年)。回路的不断更新(包括在回家的旅程中)导致在到达旅程的起点时出现以回家位置为中心的搜索行为。还有可能证明,只需对该回路进行极少量的添加(尚未从神经生理学上得到证实),就能支持记忆和返回食物位置的能力,并进行在它们之间走捷径所需的向量加法(Le Moël 等人,2019 年)。
There are a number of additional subtleties that optimise this circuit for the function of accurate path integration. For example, inhibitory connections between the compass neurons appear to shape the response across them to closely approximate a cosine—the correct mathematical mapping of the vector heading onto the axis corresponding to each column (Lyu et al., 2020). The optic flow neurons are in fact responsive to translation of the animal in orthogonal directions, which allows the system as a whole to track the actual ground velocity even if the animal is not moving in the direction it is facing, for example, during side-slip in flight (Stone et al., 2017). The constant update of the circuit, including during the homeward journey, results in an emergent search behaviour centred on the home location when the origin of the journey has been reached. It has also been possible to show that very minimal additions (as yet to be confirmed neurophysiologically) to this circuit would support the ability to memorise and return to food locations and to do the vector addition needed to take shortcuts between them (Le Moël et al., 2019).
我在上面的描述中尽力避免使用表述性语言,但这是一场斗争,当然生物学家们毫不犹豫地说出每一个柱“代表”罗盘方向,活动“编码”行进距离等。横板 + 船员控制船只的功能与中枢复合体 + 昆虫感觉和运动神经元控制昆虫身体的功能之间似乎存在最直接(和最有效)的对应关系。对所需计算的理解——表示位置的“主矢量”必须通过累积速度来计算,并且拥有这样的矢量可以灵活地用于替代行为,例如捷径和显着位置的通信——似乎是采用自上而下的表征方法来解释行为的证明。事实上,许多现在已经在昆虫大脑中被映射的显著神经机制是在二十年前提出的一个纯粹假设模型中预见到的(Wittmann 和 Schwegler,1995),其中有人提出,在路径整合的神经系统中表示和处理向量的最佳方式是作为代表不同方向的神经阵列中的分布式正弦活动。
I have endeavoured in the above description to avoid as far as possible representational language, but this is a struggle, and certainly biologists have no hesitation in saying each column “represents” a compass direction, and the activity “encodes” the distance travelled, etc. There seems the most direct (and productive) correspondence possible between the function of the traverse board + ship’s crew in controlling the ship and the central complex + insect sensory and motor neurons in controlling the insect’s body. The understanding of the computation required—that a “home vector” representing the position must be calculated by accumulating velocity and that possession of such a vector can be used flexibly in alternative behaviours such as short cuts and communication of salient locations—would seem a vindication of adopting a top-down representational approach to the explanation of behaviour. Indeed, many of the salient neural mechanisms that have now been mapped in the insect brain were anticipated in a purely hypothetical model suggested some twenty years earlier (Wittmann and Schwegler, 1995), in which it was suggested that the best way to represent and process vectors in a neural system for path integration would be as a distributed sinusoidal activity across a neural array representing different directions.
现在发现的这个假设网络与实际神经回路之间的密切联系高度依赖于通过传统方法得出的神经生理学结果:测试单个神经元对哪些刺激变化有反应,从而将其功能性作用描述为代表高级计算所需的特定属性。在这种情况下,最明显的例子是发现跨原脑桥连续柱的首选极化方向响应的“指南针”编码(Heinze and Homberg,2007)。因此,表征观点所建议的自下而上的方法似乎也得到了证实。
The close connection that has now been discovered between this hypothetical network and the actual neural circuitry was highly dependent on neurophysiological results derived by the classic method: testing stimulus changes to which individual neurons were responsive, and hence characterising their functional role as representing a particular property required for a high-level computation. In this case, the clearest example is the discovery of the “compass” encoding of preferred polarisation direction responses across successive columns of the protocerebral bridge (Heinze and Homberg, 2007). Thus the bottom-up approach suggested by the representational viewpoint also seems to have been vindicated.
此外,这种表征解释似乎比将整体视为耦合动力系统更有成效(Van Gelder,1998)。我们可能会提出一个与横移板不同的比喻,其中代理通过一组沿不同轴的弹簧连接到环境中。当代理离开(稳定的)原位时,动物的运动会将动能存储在弹簧中,当释放时,代理应该直接被拉回这个吸引子。所以也许我所描述的整个神经系统可以被认为是一个纯物理系统,具有创建这种能量景观所需的动力学。然而,从多个感官输入到神经放电模式所涉及的传导和转换在这种物理解释中造成了显著的不连续性。神经激活模式正在建立(经过一些努力)以与与动物相关的外部事态并行。事实上,大脑对方向和速度的信息有单独的处理,它们只是随后被组合起来。系统中有些地方的生理机制似乎是任意的,因为只要它保留了信息结构,就可以以不同的方式实现它。例如,尚不确定“主向量”神经元中每个方向活动的增量积累是单个神经元固有的属性,还是通过小型循环网络建立的。这种不连续性和任意性似乎是表征系统的特征,只有当后者保持在非常高的水平时,才能将其纳入动态解释中——这种水平似乎更具描述性而非解释性。
Moreover, this representational account seems more productive than considering the whole as a coupled dynamical system (Van Gelder, 1998). We might suggest an alternative metaphor to the traverse board, in which an agent is connected into an environment with a set of springs along different axes. Motion of the animal stores kinetic energy in the springs as the agent moves away from the (stable) home position, and when released, the agent should be pulled directly back to this attractor. So perhaps the whole neural system I have described can be thought of as a purely physical system with the requisite dynamics to create such an energy landscape. However, the transduction and transformation involved in going from multiple sensory inputs to neural firing patterns creates a significant discontinuity in this physical intepretation. A neural activation pattern is being set up (with some effort) to parallel an external state of affairs that is of relevance to the animal. Factually, there is separate processing in the brain of information about direction and speed, which are only subsequently combined. There are points in the system where the physiological mechanism seems arbitrary, as it could have been differently implemented as long as it preserved the informational structure. For example, it remains uncertain whether the incremental accumulation of activity in each direction in the “home vector” neurons is a property intrinsic to individual neurons or established through small recurrent networks. This discontinuity and arbitrariness seems characteristic of representational systems and can only be subsumed in a dynamical account if the latter remains at a very high level—one that seems more descriptive than explanatory.
表征观点显然也有助于我们今后更好地理解这个系统。再次以活动累积底物为例,一些明显的“表征”问题是,累积总数的记忆需要什么样的准确性和稳定性才能支持观察到的动物行为。这可能使某些神经底物比其他神经底物更合理,这取决于它们的分辨率和漂移趋势。同样,假设神经活动模式应该代表动物的地心坐标,那么寻找一种允许对天体罗盘输入进行时间补偿的机制就变得很重要了(Pfeiffer 和 Homberg,2007),因为天空线索相对于地球全天都在变化。
The representational view is also clearly generative for improving our future understanding of this system. Taking again the example of the substrate for activity accumulation, some obvious “representational” issues are what accuracy and stability in the memory of the accumulated total is required to support the observed behaviour of the animal. This may make some neural substrates more plausible than others, depending on their resolution and tendency to drift. Similarly, the assumption that the neural activity pattern should represent the geocentric coordinates of the animal makes it relevant to look for a mechanism that allows time compensation of the celestial compass input (Pfeiffer and Homberg, 2007), given that sky cues change relative to the earth throughout the day.
上述讨论似乎强烈支持认知神经科学中的表征观点,但也有一些关键的警告。昆虫导航领域的一大优势在于,经过数十年的研究,它以动物在自然栖息地的实际行为能力为基础(Cheng 和 Freas,2015 年)。因此,在构建自上而下的计算级描述时,我们不太可能断言大脑需要表征能力,而这些能力对于解释行为实际上并不是必要的。这也有助于我们处理昆虫,因此我们先验地倾向于对它们的导航能力做出更简单的解释,而不是拟人化地假设它们相当于一张内部地图。
The forgoing discussion would seem strongly supportive for the representational point of view in cognitive neuroscience, but there are some crucial caveats. A major strength of the insect navigation field is that it has a strong grounding, over decades of study, in the actual behavioural capabilities of the animal in its natural habitat (Cheng and Freas, 2015). As a consequence, when constructing the top-down computational level description, it is less likely that we will assert the need for representational capabilities in the brain that are not actually necessary to account for the behaviour. This is also helped by the fact we are dealing with insects and hence have an a priori tendency towards simpler explanations of their navigational abilities, rather than anthropomorphically assuming the equivalent of an internal map.
然而,关于该系统的一些“计算层面”假设已使研究误入歧途。例如,似乎很明显,指示到巢穴的距离和方向的回家向量应该在以自我为中心的极坐标中进行神经编码:动物需要转多远以及需要跑多远才能回家。事实上,蚂蚁路径整合的第一个显式神经模型之一在极坐标编码中实现了距离和方向的单独电路(Hartmann 和 Wehner,1995 年)(这是我们尝试在机器人上测试的第一个路径整合模型,结果相当无效(Chapman,1998 年))。然而,一些早期的见解(Mittelstaedt 和 Mittelstaedt,1973 年)和最近的全面分析(Vickerstaff 和 Cheung,2010 年)都表明,作为跟踪巢穴的计算方法,路径整合的自我中心和极坐标表示都存在很大的局限性。
Nevertheless, some of the “computational level” assumptions made about this system have led research astray. For example, it might seem obvious that a home vector, indicating the distance and direction to the nest, should be neurally encoded in ego-centric polar coordinates: how far the animal needs to turn and how far it needs to run to get home. Indeed, one of the first explicit neural models of ant path integration implemented separate circuits for distance and direction in a polar coordinate encoding (Hartmann and Wehner, 1995) (and this was the first model of path integration we attempted to test on a robot, with rather ineffective results (Chapman, 1998)). However, both some earlier insights (Mittelstaedt and Mittelstaedt, 1973) and more recent thorough analysis (Vickerstaff and Cheung, 2010) show that both ego-centric and polar representations of path integration have substantial limitations as a computational method for keeping track of the nest.
同样,尽管对电路的一些关键见解来自传统的神经生理学方法,即在无行为的动物中探测单个神经元对控制良好的刺激的反应,但这种方法的一些结果具有误导性。使用人工线性偏振器而不是神经元的实际偏振模式天空意味着“指南针”的范围假设为 0 到 180,横跨原脑桥的一半,八列编码 22.5 度增量(Homberg 等人,2011)。只有当这个假设被抛弃,使用 45 度增量的 360 度编码(似乎与神经生理学结果(Heinze and Homberg,2007)直接矛盾)时,我们在电路建模方面才取得突破。碰巧的是,与此同时,使用视觉条和光点作为方向刺激的实验开始支持这种替代方案(Seelig 和 Jayaraman,2015)。随后,我们展示了如何从整个天空的极化模式中获得 360 度编码,在一个模型中,当呈现贫乏的实验刺激时,该模型也可以重现原始的 180 度结果(Gkanias 等人,2019)。
Similarly, although some key insights into the circuit came from the conventional neurophysiological methods of probing, in a non-behaving animal, the response of single neurons to well-controlled stimuli, some results from this approach were actively misleading. The use of an artificial linear polariser rather than the actual polarisation pattern of the sky meant that the “compass” was assumed to range from 0 to 180 across one half of the protocerebral bridge, with the eight columns encoding 22.5 degree increments (Homberg et al., 2011). Our breakthrough in modelling the circuit only came when this assumption was abandoned and a 360 degree encoding with 45 degree increments (seemingly in direct contradiction to the neurophysiological results (Heinze and Homberg, 2007)) was used. As it happened, in parallel, experiments using visual bars and light dots as the orientation stimulus started to support this alternative (Seelig and Jayaraman, 2015). Subsequently, we have shown how 360 degree encoding could indeed be obtained from the whole sky pattern of polarisation, in a model that can also reproduce the original 180 degree results when an impoverished experimental stimulus is presented (Gkanias et al., 2019).
这里的关键信息是,理解神经回路需要以动物的实际行为(以及机器人模仿)为基础。这与假设神经回路的功能是重建外部世界的内部模拟有着根本的不同——即使有时这正是神经回路支持行为所需要做的事情。路径整合的神经模型应该是执行路径整合行为的模型,而不是产生距离和方向的神经关联。这一观点是“破解”神经回路的基础。我认为,对于目前的大部分认知神经科学来说,不这样思考是进步的重大障碍。可以说,一个例子就是啮齿动物的“位置细胞”,其名称可能掩盖了它的实际功能(Eichenbaum 等人,1999 年;Dudchenko 和 Wood,2015 年)。事实上,整个神经科学研究“解码”计划的目的是表明动物所经历的刺激流可以从神经活动模式中重建(例如 Kay 等人,2008 年),但如果不理解刺激与行为相关的内容,那么进行这一计划就没有什么意义(Ritchie 等人,2019 年)。
The key message here is that understanding the neural circuit needed grounding in the actual behaviour of the animal (and in a robot imitation). This is fundamentally different from assuming that the function of the neural circuit is to reconstruct an internal analog of the external world—even if it is sometimes the case that this is just what the circuit needs to do to support the behaviour. A neural model of path integration should be a model that does the behaviour of path integration, not of producing a neural correlate of distance and direction. This point of view was fundamental to “cracking” the circuit. I would argue that for much of current cognitive neuroscience, the failure to think this way is a significant barrier to progress. Arguably, an example is the rodent “place cell” where the name has probably obscured its actual function (Eichenbaum et al., 1999; Dudchenko and Wood, 2015). Indeed the whole neuroscience research programme of “decoding,” in which the aim is to show that the stimulus stream experienced by an animal can be reconstructed from the pattern of neural activity (e.g., Kay et al., 2008), makes little sense when pursued without understanding what is relevant in the stimulus for behaviour (Ritchie et al., 2019).
此外,这种认为大脑参与(行为中立)外部世界重建的观点有时也是人工智能进步的障碍。事实上,最近的人工智能方法中存在着奇怪的分歧。一方面,深度学习和相关方法似乎是只关注期望结果(行为)的极端例子,这些结果以目标函数表示,并使用一种很大程度上机制中立的方法来生成一个有效的神经解决方案,不受任何表征假设的影响。另一方面,这些方法经常用于更大的架构中,这些架构仍然完全遵循内部世界模型假设。例如,在机器人技术中,深度学习通常用于生成高度真实的 3D 空间映射,作为采取最简单动作(如向前迈一步或伸手去拿物体)的先决条件(Yan 等人,2018 年;Yeboah 等人,2018 年)。
Further, this view of the brain as engaged in (behaviour-neutral) reconstruction of the external world is also sometimes a barrier to progress in AI. Indeed, there is a strange bifurcation in recent AI approaches. On the one hand, Deep Learning and related methods would appear to be extreme examples of paying attention only to the desired outcome (the behaviour), expressed as an objective function, and using a largely mechanism-neutral method to generate a working neural solution, unbiased by any representational assumptions. On the other hand, these methods are frequently put to use within larger architectures that still fully subscribe to the internal world model assumption. For example, in robotics, Deep Learning is often used to produce a highly veridical mapping of 3D space, as a precursor to taking the simplest action such as a step forward or a reach towards an object (Yan et al., 2018; Yeboah et al., 2018).
回到我们以昆虫仿生机器人作为 Mind Design 研究议程的起点,让我来反思一下从中吸取的教训。机器人的观点和选择一种更简单的动物进行研究都很重要。我们可能仍在争论昆虫是否有思想(Klein 和 Barron,2016 年;Birch,2020 年),但毫无疑问它们具有导航等行为能力,这些能力被认为是其他物种认知的主要典范。建立啮齿动物导航如何通过神经实现的完整解释似乎与围绕智能机制的哲学问题高度相关;但随着我们更快地接近对昆虫的这种解释,它似乎值得更多关注。我们对行为的自然条件有了更深入的了解,并且我们相对倾向于对昆虫进行简单的机械解释,这有助于我们专注于丰富的环境互动和具身认知如何降低所需内部处理的复杂性。将机器人制造作为我们模型的要求同样促使我们以机械而不是表征的方式思考,并充分利用物理和神经计算。也许令人惊讶的是,考虑到起点,这项研究的新兴图景似乎证实了认知神经科学的表征解释。尽管如此,布鲁克斯三十年前提出的许多论点仍然具有现实意义,可以帮助我们避免在对表征在智能行为中的程度、性质和作用做出假设时陷入陷阱。
Bringing this back to our starting point in insect biorobotics as a research agenda for Mind Design, let me reflect on the lessons learned. Both the robotic point of view and the choice of a simpler animal to study have been important. We might still debate whether insects have minds (Klein and Barron, 2016; Birch, 2020), but there is no question that they have behavioural capacities such as navigation that are considered prime exemplars of cognition in other species. Establishing a complete explanation of how rodent navigation is neurally implemented would seem highly relevant to philosophical questions around the mechanisms of intelligence; but as we are approaching such an explanation more rapidly for the insects, it seems worth more attention. Our greater insight into the natural conditions of the behaviour and our relative preference for simple mechanistic explanations for insects have helped to maintain focus on how rich environmental interaction and embodied cognition reduce the complexity of the internal processing required. Having robot-building as a requirement for our models similarly pushes us towards thinking in mechanistic rather than representational terms and in making the most of physical as well as neural computation. Perhaps surprisingly, given the starting point, the emerging picture from this research seems to vindicate representational accounts of cognitive neuroscience. Nevertheless, many of the arguments made by Brooks thirty years ago remain relevant and can help us avoid pitfalls in making assumptions about the degree, nature, and role of representation in intelligent behaviour.
本卷二十四篇文章中有七篇保留了《心智设计》第二版中的部分内容(Haugeland 的导言、Newell 和 Simon、Turing、Dennett、Searle、Rummelhart 和 Brooks)。另有十一篇最初在其他地方出版,并经过不同程度的编辑和删节后收录在本版中(Marr、Levesque、Russell、Fodor、Boden、Egan、Clark、Pearl、Churchland 和 Sejnowski、Cowie 和 Woodward 以及 Haugeland)。其余六篇(Craver 和 Klein、Maley、Mitchell、Buckner、Haas 和 Webb)是为本版撰写的。
Seven of the twenty-four essays in this volume were retained from the second edition of Mind Design (Haugeland’s introduction, Newell and Simon, Turing, Dennett, Searle, Rummelhart, and Brooks). Eleven more were originally printed elsewhere, and appear in this edition with varying degrees of editing and abridgment (Marr, Levesque, Russell, Fodor, Boden, Egan, Clark, Pearl, Churchland and Sejnowski, Cowie and Woodward, and Haugeland). The remaining six (Craver and Klein, Maley, Mitchell, Buckner, Haas, and Webb) were written for this edition.
本卷部分由澳大利亚国立大学未来基金(CK)资助。
This volume was partly supported by funding from an Australian National University Futures Grant (to C.K.).
1. C RAVER 和K LEIN
《心智设计导论III》及其六个部分的引言首次出现在此版中。
1. CRAVER AND KLEIN
“Introduction to Mind Design III” and the the introductions to the six parts appear for the first time in this edition.
2. H AUGELAND “什么是心智设计?”是作为《心智设计 II》
的导言而写的。它被略加删节,以删除不再收录在本卷中的文本间引用。经麻省理工学院出版社许可,在此重印。
2. HAUGELAND
“What Is Mind Design?” was written as the introduction of Mind Design II. It is included with light abridgment to remove intertextual references to works no longer included in the present volume. It is reprinted here by permission of the MIT Press.
3. Newell和Simon “计算机科学作为实证研究:符号和搜索”最初以 Newell 和 Simon (1976) 的名义出版。经 ACM 许可,在
此重印。
3. NEWELL AND SIMON
“Computer Science as Empirical Inquiry: Symbols and Search” was first published as Newell and Simon (1976). It is reprinted here by permission of the ACM.
4. M ARR Marr's Vision
的摘录最初见于 Marr (1982) 的导言和第 1 章和第 6 章。经麻省理工学院出版社许可,在此重印。
4. MARR
The excepts from Marr’s Vision are originally found in the introduction and chapters 1 and 6 of Marr (1982). It is reprinted here by permission of the MIT Press.
5. MALEY
“The Analog Alternative” 首次在此发表。经作者许可印刷。
5. MALEY
“The Analog Alternative” appears here for the first time. It is printed with permission of the author.
6. T URING “计算机器与智能”最初以 Turing 为名出版 (1950)。经Mind
许可,在此重印。
6. TURING
“Computing Machinery and Intelligence” was first published as Turing (1950). It is reprinted here by permission of Mind.
7. L EVESQUE
“论我们的最佳行为”最初发表于 IJCAI-13。我们在此收录了 Levesque (2014) 的版本。经人工智能许可在此转载。
作者致谢:本文是 IJCAI-13 会议在北京发表的研究卓越讲座的书面版本。感谢 Vaishak Belle 和 Ernie Davis 的有益评论。
7. LEVESQUE
“On Our Best Behaviour” was originally delivered at IJCAI-13. We include here the version that appeared as Levesque (2014). It is reprinted here by permission of Artificial Intelligence.
Author acknowledgments: This paper is a written version of the Research Excellence Lecture presented in Beijing at the IJCAI-13 conference. Thanks to Vaishak Belle and Ernie Davis for helpful comments.
8. R USSELL
“理性与智能” 原版为 Russell (1997)。经人工智能许可,在此转载。
8. RUSSELL
“Rationality and Intelligence” was originally printed as Russell (1997). It is reprinted here by permission of Artificial Intelligence.
9. F ODOR
这些摘录自《心灵的模块化》一书,最初收录于 Fodor (1983) 的第四部分“中央系统”。经麻省理工学院出版社许可,在此重印。
9. FODOR
These excepts from The Modularity of Mind are originally found in part IV, “Central Systems” of Fodor (1983). It is reprinted here by permission of the MIT Press.
10. M ITCHELL
“为什么人工智能比我们想象的更难” 首次出现在这里。经作者许可印刷。
作者致谢:本材料基于美国国家科学基金会资助的 2020103 号资助工作。本材料中表达的任何意见、发现和结论或建议均为作者的观点,并不一定反映美国国家科学基金会的观点。这项工作还得到了圣达菲研究所的支持。我感谢 Philip Ball、Rodney Brooks、Daniel Dennett、Stephanie Forrest、Douglas Hofstadter、Tyler Millhouse、Melanie Moses 和 Jacob Springer 对本文初稿的评论。
10. MITCHELL
“Why AI Is Harder than We Think” appears here for the first time. It is printed with permission of the author.
Author acknowledgments: This material is based upon work supported by the National Science Foundation under Grant No. 2020103. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. This work was also supported by the Santa Fe Institute. I am grateful to Philip Ball, Rodney Brooks, Daniel Dennett, Stephanie Forrest, Douglas Hofstadter, Tyler Millhouse, Melanie Moses, and Jacob Springer for comments on an earlier draft of this manuscript.
11. D ENNETT
“真正的信徒:意向策略及其工作原理”最初于 1979 年 11 月在牛津大学的赫伯特·斯宾塞讲座上发表,并首次发表在 Heath(1981 年)中;它在 Dennett(1987b)中重印,并经作者和麻省理工学院出版社许可在此收录。
11. DENNETT
“True Believers: The Intentional Strategy and Why It Works” was originally presented as a Herbert Spencer Lecture at Oxford in November 1979, and was first published in Heath (1981); it is reprinted in Dennett (1987b), and is included here by permission of the author and the MIT Press.
12. S EARLE
“Minds Brains, and Programs” 最初以 Searle (1980) 的名义出版。经出版商许可,在此重印。
作者致谢:作者感谢许多人讨论这些问题,并耐心尝试克服他对人工智能 (AI) 的无知 — 特别感谢 Ned Block、Hubert Dreyfus、John Haugeland、Roger Schank、Robert Wilensky 和 Terry Winograd。
12. SEARLE
“Minds Brains, and Programs” was first published as Searle (1980). It is reprinted here by permission of the publisher.
Author acknowledgments: The author acknowledges his debts to a rather large number of people for discussion of these matters and for their patient attempts to overcome his ignorance of artificial intelligence (AI)—with special thanks to Ned Block, Hubert Dreyfus, John Haugeland, Roger Schank, Robert Wilensky, and Terry Winograd.
13.博登
《逃离中文房间》原载于《博登》(1988 年,第 238-251 页)。经剑桥大学出版社许可,本文转载。
13. BODEN
“Escaping from the Chinese Room” was originally published in Boden (1988, 238–251). It is reprinted here with permission of Cambridge University Press.
14. E GAN
“计算与内容”最初以 Egan (1995) 为名出版。经杜克大学出版社许可,在此转载。
作者致谢:感谢 Kent Bach、Noam Chomsky、Patricia Pitcher、Robert Matthews、Colin McGinn、Rob Wilson 和编辑对本文早期版本的有益评论。
14. EGAN
“Computation and Content” was originally published as Egan (1995). It is reprinted here with permission of Duke University Press.
Author acknowledgments: Thanks to Kent Bach, Noam Chomsky, Patricia Pitcher, Robert Matthews, Colin McGinn, Rob Wilson, and the editors for helpful comments on earlier versions of this paper.
15. B UCKNER
“深度神经网络中的转换抽象” 首次出现于此。经作者许可印刷。
15. BUCKNER
“Transformational Abstraction in Deep Neural Networks” appears here for the first time. It is printed with permission of the author.
16. H AAS
“评价思维”首次出现于此。经作者许可印刷。
16. HAAS
“The Evaluative Mind” appears here for the first time. It is printed with permission of the author.
17. C LARK
“接下来会怎样?预测大脑、情境代理和认知科学的未来”是 Clark (2013) 的节选。经出版商许可在此发表。
17. CLARK
“Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science” is an abridgment of Clark (2013). It is printed here with permission of the publisher.
18. P EARL
“机器学习的理论障碍与因果革命的七个火花”最初是在 WSDM'18:第十一届 ACM 国际会议论文集上作为主题演讲发表的。经作者许可在此发表。
作者致谢:本研究部分由国防高级研究计划局 [#W911NF-16-057]、美国国家科学基金会 [#IIS-1302448、#IIS-1527490 和 #IIS-1704932] 和海军研究办公室 [#N00014-17-S-B001] 提供资助。
18. PEARL
“Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution” was originally presented as a keynote talk at WSDM’18: Proceedings of the Eleventh ACM International Conference. It is printed here with permission of the author.
Author acknowledgments: This research was supported in parts by grants from Defense Advanced Research Projects Agency [#W911NF-16-057], National Science Foundation [#IIS-1302448, #IIS-1527490, and #IIS-1704932], and Office of Naval Research [#N00014-17-S-B001].
19. R UMMELHART “心智架构:联结主义方法” 最早发表于波斯纳 (1989)。经麻省理工学院出版社许可,在此重印。
19. RUMMELHART “The Architecture of Mind: A Connectionist Approach” first appeared in Posner (1989). It is reprinted here by permission of the MIT Press.
20. C. Churchland 和S. EJNOWSKI Churchland 和 Sejnowski 合著的《计算大脑》
摘录最初发表于 Churchland 和 Sejnowski (1992)。经麻省理工学院出版社许可,在此重印。
20. CHURCHLAND AND SEJNOWSKI
The excerpts from Churchland and Sejnowski’s The Computational Brain originally appeared in Churchland and Sejnowski (1992). They are reprinted here by permission of the MIT Press.
21. 考伊和伍德沃德“心灵并非(仅仅)由自然选择塑造的模块系统” 原载于考伊和伍德沃德 (2004)。经出版商许可,在此重印。
21. COWIE AND WOODWARD
“The Mind Is Not (Just) a System of Modules Shaped (Just) by Natural Selection” originally appeared as Cowie and Woodward (2004). It is reprinted here with permission of the publisher.
22. H AUGELAND
“心灵的体现与嵌入”最初发表于《Acta Philosophica Fennica》,并重印于 Haugeland (1997a)。本文经芬兰哲学学会许可发表。
22. HAUGELAND
“Mind Embodied and Embedded” originally appeared in Acta Philosophica Fennica and was reprinted in Haugeland (1997a). It appears here with permission of the Philosophical Society of Finland.
23. B ROOKS
“没有表象的智能”的原始版本最初于 1987 年 6 月在 Endicott House 的人工智能基础研讨会上发表,随后发表在《人工智能》 47: 139–159 (1991) 中。本卷中的版本与之前的版本主要有两点不同:增加了约 30 段,删除了 10 段。增加的段落现在构成了第 6.3 小节和第 8 节;它们摘自 Rodney A. Brooks 的《没有理性的智能》,麻省理工学院 AI 备忘录 #1293 (1991),后来发表在 1991 年国际人工智能联合会议的论文集上。删除的段落全部取自最后一节(早期版本中的 8.1–8.3)。还有一些其他更改,包括新照片。此版本由 John Haugeland 编写,经作者许可出版。
作者致谢:Phil Agre、David Chapman、Peter Cudhea、Anita Flynn、Ian Horswell、David Kirsh、Pattie Maes、Thomas Marill、Maja Mataric 和 Lynn Parker 在撰写本文的两篇或其中一篇时提供了帮助。本文描述的研究是在麻省理工学院 (MIT) 的人工智能实验室完成的。支持来自 IBM 教师发展奖、系统开发基金会、休斯人工智能中心、西门子公司和马自达公司的资助、大学研究计划 (ONR 合同 N00014-86-K-0685) 和 ARPA (ONR 合同 N00014-85-K-0124)。
23. BROOKS
“Intelligence without Representation,” in its original form, was first presented at the Workshop on Foundations of Artificial Intelligence at Endicott House in June 1987, and subsequently appeared in Artificial Intelligence 47: 139–159 (1991). The version in this volume differs from that earlier one in two main ways: about thirty paragraphs have been added, and ten deleted. The added paragraphs now make up subsection 6.3 and section 8; they are extracted from “Intelligence without Reason,” by Rodney A. Brooks, MIT AI Memo #1293 (1991), later published in the proceedings of the 1991 International Joint Conference on Artificial Intelligence. The deleted paragraphs were all taken from the final section (8.1–8.3 in the earlier version). There have also been a few other changes, including new photos. This version, compiled by John Haugeland, is published by permission of the author.
Author acknowledgments: Phil Agre, David Chapman, Peter Cudhea, Anita Flynn, Ian Horswell, David Kirsh, Pattie Maes, Thomas Marill, Maja Mataric, and Lynn Parker were helpful in the preparation of one or both of the two essays that were combined to make this one.The research described here was done at the AI Laboratory at the Massachusetts Institute of Technology (MIT). Support has been provided by an IBM Faculty Development Award, by grants from the Systems Development Foundation, the Hughes AI Center, Siemens Corporation, and Mazda Corporation, by the University Research Initative under ONR contract N00014-86-K-0685, and by ARPA under ONR contract N00014-85-K-0124.
24. W EBB
“仿生机器人给哲学带来什么?两个导航系统的故事”首次刊登于此。经作者许可印刷。
24. WEBB
“What Does Biorobotics Offer Philosophy? A Tale of Two Navigation Systems” appears here for the first time. It is printed with permission of the author.